nickva commented on issue #5127: URL: https://github.com/apache/couchdb/issues/5127#issuecomment-2231720591
Good finds @chewbranca! Clearly there is something broken here and we should fix. Thanks for the detailed analysis! > we should consider moving the cleanup to the dedicated rexi_mon process For streams we already have a cleanup process spawned for every streaming request https://github.com/apache/couchdb/blob/main/src/fabric/src/fabric_streams.erl#L47. We should see why that doesn't clean up the workers and lets them timeout instead. Perhaps it's too cautious to avoid sending unnecessary kill messages? It tries to use the `rexi_STREAM_CANCEL` which makes the worker exit `normal`, instead of killing it to avoid generating sasl generate sasl logs. But perhaps that won't happen as those workers are not gen_servers? Recently we also added a kill_all command to aggregate kill commands per node, so instead of sending one per shard, it's one per node with a list of refs, maybe that's enough to keep the overhead of the extra kills fairly low. Another thing to keep it mind is that we don't always want to kill the workers, at least in the update docs path we specifically allow them to finish updating to reduce the pressure on the internal replicator. > Looks like dreyfus_rpc does the right thing and cleanup the Workers in the outer after clause Dreyfus doesn't use the streams facility, so likely has a slightly different way to doing cleanup. There is also the complication of replacements if they are spawned, those have to be cleaned up as well. However if we do a blanket `kill_all` for all the workers then it should take care of that, too. But, it would nice to see what corner cases we're missing currently. Which errors are generated and if it's triggered by some error or just a race condition... Do you have a easily reproducible scenario to test it out? Start a 3 node cluster and issue a bunch of _all_docs calls? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
