nickva commented on issue #4448:
URL: https://github.com/apache/couchdb/issues/4448#issuecomment-1714673445

   The callback API for each scanner module might look like:
   
   ```erlang
   {ok, Ctx} = start_scan(#{session_id => Uuid, start_timestamp => UnixTs})
   {ok, Ctx1} = start_db(Ctx, DbName)
   {ok, Ctx1} = ddoc(Ctx, DbName, DDoc = #doc{})
   {ok, Ctx1} = shard(Ctx, Db)
   {ok, Ctx1} = end_db(Ctx, DbName)
   ok = end_scan(Ctx)
   ```
   
   This flow would be initialized and kept for each module individually. The 
scanner server process would hold a context that looks like:
   
    ```erlang
   
    #state{modstates = #{Module1 => Ctx1, Module2 => Ctx2} ....}
    ```
   
   The scan would be run on all the nodes. During scanning only the dbs with 
the first shard copy on that node would be scanned. The API doesn't call 
per-document callback. The idea would be that each plugin them may choose to 
sample only some docs or process all the docs or simply return `{ok, Ctx}` and 
move on.
   
   A few events might stop or pause scanning:
     * cluster membership changes (or just nodelist changes?)
     * configuration change (should .ini configuration changes stop and reset 
the scan?)
     
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to