Hi dev,

I searched around on this but couldn't find any related JIRA tickets or
work, although perhaps I missed it.

We've run into a major scaling problem in the shard splitting operation.
The entire shard split is a single operation in overseer, and blocks any
other queue items from executing while the shard split happens.  Shard
splits can take on the order of many minutes to complete, during this time
no other overseer ops (including status updates) can occur.  Additionally,
this means you can only run a single shard split operation at a time,
across an entire deployment.

Is anyone already working on this?  If not, I'm planning on working on it
myself, because we have to solve this scaling issue one way or another.
I'd love to get guidance from someone knowledgeable, both to make it more
solid, and also hopefully so it could be upstreamed.

Thanks!
Scott

Reply via email to