nickva opened a new pull request, #5014:
URL: https://github.com/apache/couchdb/pull/5014

   [WIP] Everything is place except docs and tests
   
   The app scans all the dbs and docs. It has a plugin system to allow 
gathering various things from a cluster. The first use is to scan all the 
javascript design docs and run them through the new QuickJS javascript engine.
   
   Other possible uses:
    - Gather total db and view sizes
    - Scan for document features (docs of certain sizes, contained certain 
fields and values).
   
   The plugins are managed as individual process by the couch_scanner_server 
with the start_link/1 and stop/1 functions. After a plugin runner is spawned, 
the only thing couch_scanner_server does is wait for it to exit.
   
   The plugin runner process may exit normally, crash, or exit with {shutdown, 
{reschedule, TSec}} if they want to reschedule to run again at some point the 
future (next day, a week later, etc).
   
   After the process starts, it will load and validate the plugin module. Then, 
it will start scanning all the dbs and docs on the local node. Shard ranges 
will be scanned only on one of the cluster nodes to avoid duplicating work. For 
instance, if there are 2 shard ranges, 0-7, 8-f, with copies on nodes n1, n2, 
n3. Then, 0-7 might be scanned on n1 only, and 8-f on n3.
   
   The plugin API is the following (as OTP callback definitions):
   
   ```erlang
   -callback start(ScanId :: binary(), EJson :: #{}) ->
       {ok, St :: term()} | skip.
   -callback resume(ScanId :: binary(), EJson :: #{}) ->
       {ok, St :: term()} | skip.
   -callback stop(St :: term()) ->
       {ok, EJson :: #{}}.
   -callback checkpoint(St :: term()) ->
       {ok, EJson :: #{}}.
   -callback db(St :: term(), DbName :: binary()) ->
       {ok | skip | stop, St1 :: term()}.
   -callback ddoc(St :: term(), DbName :: binary(), #doc{}) ->
       {ok | stop, St1 :: term()}.
   -callback shards(St :: term(), [#shard{}]) ->
       {[#shard{}], St1 :: term()}.
   -callback db_opened(St :: term(), Db :: term()) ->
       {ok, St :: term()}.
   -callback doc_id(St :: term(), DocId :: binary(), Db :: term()) ->
       {ok | skip | stop, St1 :: term()}.
   -callback doc(St :: term(), Db :: term(), #doc{}) ->
       {ok | stop, St1 :: term()}.
   -callback db_closing(St :: term(), Db :: term()) ->
       {ok, St1 :: term()}.
   ```
   
   A simple plugin `couch_scanner_plugin_ddoc_features` is included as first 
example implementation. It traverses the design docs on a cluster and reports 
when it finds Apache CouchDB 4.x deprecated features (lists, shows, etc).
   
   Plugin module are enabled by `$plugin_mod = true` entries in the 
`[couch_scanner_plugins]` section. For example, to enable 
`couch_scanner_plugin_ddoc_features`:
   ```
   [couch_scanner_plugins]
   couch_scanner_plugin_ddoc_features = true
   ```
   
   Plugins may configure their scheduling using `after` and `repeat` config 
values. For example, to start after Unix time stamp 1711249693 and then run 
every 3 days:
   ```
   [couch_scanner_plugin_ddoc_features]
   after = 1711249693
   repeat = 3_days
   ```
   
   The default values for `after` and `repeat` is ` = restart`, meaning to run 
once after the node starts up.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to