Hi Scott, Shard splitting shouldn't block unrelated tasks. Here's the current definition of 'unrelated': anything that involves a different collection. Right now, the Overseer only processes one collection specific task at a time, however, you should certainly be able to split shards from other collections. It's a bug if it doesn't work that way.
There is logic to check for mutual exclusion so that race conditions don't come back to bite us e.g. if I send in add replica, shard split, delete replica, AND/OR delete shard request for the same collection, we might run into issues. On Mon, Jan 25, 2016 at 1:02 PM, Scott Blum <[email protected]> wrote: > Hi dev, > > I searched around on this but couldn't find any related JIRA tickets or > work, although perhaps I missed it. > > We've run into a major scaling problem in the shard splitting operation. > The entire shard split is a single operation in overseer, and blocks any > other queue items from executing while the shard split happens. Shard > splits can take on the order of many minutes to complete, during this time > no other overseer ops (including status updates) can occur. Additionally, > this means you can only run a single shard split operation at a time, > across an entire deployment. > > Is anyone already working on this? If not, I'm planning on working on it > myself, because we have to solve this scaling issue one way or another. > I'd love to get guidance from someone knowledgeable, both to make it more > solid, and also hopefully so it could be upstreamed. > > Thanks! > Scott > > -- Anshum Gupta
