Hi dev, I searched around on this but couldn't find any related JIRA tickets or work, although perhaps I missed it.
We've run into a major scaling problem in the shard splitting operation. The entire shard split is a single operation in overseer, and blocks any other queue items from executing while the shard split happens. Shard splits can take on the order of many minutes to complete, during this time no other overseer ops (including status updates) can occur. Additionally, this means you can only run a single shard split operation at a time, across an entire deployment. Is anyone already working on this? If not, I'm planning on working on it myself, because we have to solve this scaling issue one way or another. I'd love to get guidance from someone knowledgeable, both to make it more solid, and also hopefully so it could be upstreamed. Thanks! Scott
