nickva commented on issue #4448:
URL: https://github.com/apache/couchdb/issues/4448#issuecomment-1714673445
The callback API for each scanner module might look like:
```erlang
{ok, Ctx} = start_scan(#{session_id => Uuid, start_timestamp => UnixTs})
{ok, Ctx1} = start_db(Ctx, DbName)
{ok, Ctx1} = ddoc(Ctx, DbName, DDoc = #doc{})
{ok, Ctx1} = shard(Ctx, Db)
{ok, Ctx1} = end_db(Ctx, DbName)
ok = end_scan(Ctx)
```
This flow would be initialized and kept for each module individually. The
scanner server process would hold a context that looks like:
```erlang
#state{modstates = #{Module1 => Ctx1, Module2 => Ctx2} ....}
```
The scan would be run on all the nodes. During scanning only the dbs with
the first shard copy on that node would be scanned. The API doesn't call
per-document callback. The idea would be that each plugin them may choose to
sample only some docs or process all the docs or simply return `{ok, Ctx}` and
move on.
A few events might stop or pause scanning:
* cluster membership changes (or just nodelist changes?)
* configuration change (should .ini configuration changes stop and reset
the scan?)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]