nickva commented on issue #5127:
URL: https://github.com/apache/couchdb/issues/5127#issuecomment-2231720591

   Good finds @chewbranca! Clearly there is something broken here and we should 
fix. Thanks for the detailed analysis!
   
   > we should consider moving the cleanup to the dedicated rexi_mon process
   
   For streams we already have a cleanup process spawned for every streaming 
request 
https://github.com/apache/couchdb/blob/main/src/fabric/src/fabric_streams.erl#L47.
 We should see why that doesn't clean up the workers and lets them timeout 
instead.
   
   Perhaps it's too cautious to avoid sending unnecessary kill messages? It 
tries to use the `rexi_STREAM_CANCEL` which makes the worker exit `normal`, 
instead of killing it to avoid generating sasl generate sasl logs. But perhaps 
that won't happen as those workers are not gen_servers?
   
   Recently we also added a kill_all command to aggregate kill commands per 
node, so instead of sending one per shard, it's one per node with a list of 
refs, maybe that's enough to keep the overhead of the extra kills fairly low.
   
   Another thing to keep it mind is that we don't always want to kill the 
workers, at least in the update docs path we specifically allow them to finish 
updating to reduce the pressure on the internal replicator.
   
   > Looks like dreyfus_rpc does the right thing and cleanup the Workers in the 
outer after clause
   
   Dreyfus doesn't use the streams facility, so likely has a slightly different 
way to doing cleanup. There is also the complication of replacements if they 
are spawned, those have to be cleaned up as well. However if we do a blanket 
`kill_all` for all the workers then it should take care of that, too. But, it 
would nice to see what corner cases we're missing currently. Which errors are 
generated and if it's triggered by some error or just a race condition...
   
   Do you have a easily reproducible scenario to test it out? Start a 3 node 
cluster and issue a bunch of _all_docs calls?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to