[ https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050630#comment-14050630 ]
Steve Rowe commented on SOLR-6137: ---------------------------------- bq. it looks like the v4 patch was against an earlier version of the patch or something Crap, re-reading the comments I see that [~gchanan]'s patch assumes that the patch on SOLR-6180 is applied first, I'll start looking there now - I'm guessing that this is the source of the patch problems I noted above. > Managed Schema / Schemaless and SolrCloud concurrency issues > ------------------------------------------------------------ > > Key: SOLR-6137 > URL: https://issues.apache.org/jira/browse/SOLR-6137 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis, SolrCloud > Reporter: Gregory Chanan > Attachments: AddSchemaFieldsUpdateProcessorFactory.java.svnpatch.rej, > SOLR-6137.patch, SOLR-6137.patch, SOLR-6137v2.patch, SOLR-6137v3.patch, > SOLR-6137v4.patch > > > This is a follow up to a message on the mailing list, linked here: > http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E > The Managed Schema integration with SolrCloud seems pretty limited. > The issue I'm running into is variants of the issue that schema changes are > not pushed to all shards/replicas synchronously. So, for example, I can make > the following two requests: > 1) add a field to the collection on server1 using the Schema API > 2) add a document with the new field, the document is routed to a core on > server2 > Then, there appears to be a race between when the document is processed by > the core on server2 and when the core on server2, via the > ZkIndexSchemaReader, gets the new schema. If the document is processed > first, I get a 400 error because the field doesn't exist. This is easily > reproducible by adding a sleep to the ZkIndexSchemaReader's processing. > I hit a similar issue with Schemaless: the distributed request handler sends > out the document updates, but there is no guarantee that the other > shards/replicas see the schema changes made by the update.chain. > Another issue I noticed today: making multiple schema API calls concurrently > can block; that is, one may get through and the other may infinite loop. > So, for reference, the issues include: > 1) Schema API changes return success before all cores are updated; subsequent > calls attempting to use new schema may fail > 2) Schemaless changes may fail on replicas/other shards for the same reason > 3) Concurrent Schema API changes may block > From Steve Rowe on the mailing list: > {quote} > For Schema API users, delaying a couple of seconds after adding fields before > using them should workaround this problem. While not ideal, I think schema > field additions are rare enough in the Solr collection lifecycle that this is > not a huge problem. > For schemaless users, the picture is worse, as you noted. Immediate > distribution of documents triggering schema field addition could easily prove > problematic. Maybe we need a schema update blocking mode, where after the ZK > schema node watch is triggered, all new request processing is halted until > the schema is finished downloading/parsing/swapping out? (Such a mode should > help Schema API users too.) > {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org