[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues

Steve Rowe (JIRA) Wed, 02 Jul 2014 12:54:34 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050630#comment-14050630
 ]


Steve Rowe commented on SOLR-6137:
----------------------------------

bq.  it looks like the v4 patch was against an earlier version of the patch or 
something

Crap, re-reading the comments I see that [~gchanan]'s patch assumes that the 
patch on SOLR-6180 is applied first, I'll start looking there now - I'm 
guessing that this is the source of the patch problems I noted above.

> Managed Schema / Schemaless and SolrCloud concurrency issues
> ------------------------------------------------------------
>
>                 Key: SOLR-6137
>                 URL: https://issues.apache.org/jira/browse/SOLR-6137
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis, SolrCloud
>            Reporter: Gregory Chanan
>         Attachments: AddSchemaFieldsUpdateProcessorFactory.java.svnpatch.rej, 
> SOLR-6137.patch, SOLR-6137.patch, SOLR-6137v2.patch, SOLR-6137v3.patch, 
> SOLR-6137v4.patch
>
>
> This is a follow up to a message on the mailing list, linked here: 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
> The Managed Schema integration with SolrCloud seems pretty limited.
> The issue I'm running into is variants of the issue that schema changes are 
> not pushed to all shards/replicas synchronously.  So, for example, I can make 
> the following two requests:
> 1) add a field to the collection on server1 using the Schema API
> 2) add a document with the new field, the document is routed to a core on 
> server2
> Then, there appears to be a race between when the document is processed by 
> the core on server2 and when the core on server2, via the 
> ZkIndexSchemaReader, gets the new schema.  If the document is processed 
> first, I get a 400 error because the field doesn't exist.  This is easily 
> reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
> I hit a similar issue with Schemaless: the distributed request handler sends 
> out the document updates, but there is no guarantee that the other 
> shards/replicas see the schema changes made by the update.chain.
> Another issue I noticed today: making multiple schema API calls concurrently 
> can block; that is, one may get through and the other may infinite loop.
> So, for reference, the issues include:
> 1) Schema API changes return success before all cores are updated; subsequent 
> calls attempting to use new schema may fail
> 2) Schemaless changes may fail on replicas/other shards for the same reason
> 3) Concurrent Schema API changes may block
> From Steve Rowe on the mailing list:
> {quote}
> For Schema API users, delaying a couple of seconds after adding fields before 
> using them should workaround this problem.  While not ideal, I think schema 
> field additions are rare enough in the Solr collection lifecycle that this is 
> not a huge problem.
> For schemaless users, the picture is worse, as you noted.  Immediate 
> distribution of documents triggering schema field addition could easily prove 
> problematic.  Maybe we need a schema update blocking mode, where after the ZK 
> schema node watch is triggered, all new request processing is halted until 
> the schema is finished downloading/parsing/swapping out? (Such a mode should 
> help Schema API users too.)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues

Reply via email to