Looking deeper, it's entirely possible my experience is out of date; we're
running a Solr ~5.2.1 installation, and I'm 100% sure that in 5.2.1 a split
shard command completely blocks overseer.  Even OVERSEERSTATUS times out
while a split shard is happening.

Perhaps this was fixed as part of SOLR-7855?  I don't grok all the new
code, but it looks like as of 5.4 there's some support for overseer doing
more things concurrently.



On Mon, Jan 25, 2016 at 4:57 PM, Anshum Gupta <[email protected]>
wrote:

> Hi Scott,
>
> Shard splitting shouldn't block unrelated tasks. Here's the current
> definition of 'unrelated': anything that involves a different collection.
> Right now, the Overseer only processes one collection specific task at a
> time, however, you should certainly be able to split shards from other
> collections. It's a bug if it doesn't work that way.
>
> There is logic to check for mutual exclusion so that race conditions don't
> come back to bite us e.g. if I send in add replica, shard split, delete
> replica, AND/OR delete shard request for the same collection, we might run
> into issues.
>
>
> On Mon, Jan 25, 2016 at 1:02 PM, Scott Blum <[email protected]> wrote:
>
>> Hi dev,
>>
>> I searched around on this but couldn't find any related JIRA tickets or
>> work, although perhaps I missed it.
>>
>> We've run into a major scaling problem in the shard splitting operation.
>> The entire shard split is a single operation in overseer, and blocks any
>> other queue items from executing while the shard split happens.  Shard
>> splits can take on the order of many minutes to complete, during this time
>> no other overseer ops (including status updates) can occur.  Additionally,
>> this means you can only run a single shard split operation at a time,
>> across an entire deployment.
>>
>> Is anyone already working on this?  If not, I'm planning on working on it
>> myself, because we have to solve this scaling issue one way or another.
>> I'd love to get guidance from someone knowledgeable, both to make it more
>> solid, and also hopefully so it could be upstreamed.
>>
>> Thanks!
>> Scott
>>
>>
>
>
> --
> Anshum Gupta
>

Reply via email to