[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues

Gregory Chanan (JIRA) Thu, 05 Jun 2014 22:02:12 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019582#comment-14019582
 ]


Gregory Chanan commented on SOLR-6137:
--------------------------------------

This is indeed tricky stuff.  Consider too if the Schema API was expanded to 
allow the full range of actions you could do to an unmanaged schema with solr 
down, i.e. removing fields or changing types.  Then, only checking on 
non-existing fields wouldn't work.

There's also the case of the same field being added simultaneously on different 
nodes with different types via schemaless.  I haven't actually tested that, but 
I bet it would hang like SOLR-6145...

How about something like this for schemaless (I haven't totally thought this 
through, I'll take a closer look tomorrow to see if this is feasible):
As I suggested before, we run more of the update.chain so the fields are added 
on each core.  If the the update to ZK fails because of version mismatch, we 
download the new schema and try to apply our changes again.  If the field 
already exists with the correct type, we don't need to do anything.  If the 
field exists with the wrong type, we throw an exception and the update should 
fail (which seems correct, because that would happen if the updates were not 
done simultaneously).


> Managed Schema / Schemaless and SolrCloud concurrency issues
> ------------------------------------------------------------
>
>                 Key: SOLR-6137
>                 URL: https://issues.apache.org/jira/browse/SOLR-6137
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis, SolrCloud
>            Reporter: Gregory Chanan
>
> This is a follow up to a message on the mailing list, linked here: 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
> The Managed Schema integration with SolrCloud seems pretty limited.
> The issue I'm running into is variants of the issue that schema changes are 
> not pushed to all shards/replicas synchronously.  So, for example, I can make 
> the following two requests:
> 1) add a field to the collection on server1 using the Schema API
> 2) add a document with the new field, the document is routed to a core on 
> server2
> Then, there appears to be a race between when the document is processed by 
> the core on server2 and when the core on server2, via the 
> ZkIndexSchemaReader, gets the new schema.  If the document is processed 
> first, I get a 400 error because the field doesn't exist.  This is easily 
> reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
> I hit a similar issue with Schemaless: the distributed request handler sends 
> out the document updates, but there is no guarantee that the other 
> shards/replicas see the schema changes made by the update.chain.
> Another issue I noticed today: making multiple schema API calls concurrently 
> can block; that is, one may get through and the other may infinite loop.
> So, for reference, the issues include:
> 1) Schema API changes return success before all cores are updated; subsequent 
> calls attempting to use new schema may fail
> 2) Schemaless changes may fail on replicas/other shards for the same reason
> 3) Concurrent Schema API changes may block
> From Steve Rowe on the mailing list:
> {quote}
> For Schema API users, delaying a couple of seconds after adding fields before 
> using them should workaround this problem.  While not ideal, I think schema 
> field additions are rare enough in the Solr collection lifecycle that this is 
> not a huge problem.
> For schemaless users, the picture is worse, as you noted.  Immediate 
> distribution of documents triggering schema field addition could easily prove 
> problematic.  Maybe we need a schema update blocking mode, where after the ZK 
> schema node watch is triggered, all new request processing is halted until 
> the schema is finished downloading/parsing/swapping out? (Such a mode should 
> help Schema API users too.)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues

Reply via email to