Thanks for the reply, Steve. I filed SOLR-6137.
Greg On Wed, Jun 4, 2014 at 4:08 PM, Steve Rowe <[email protected]> wrote: > Hi Greg, > > Your understanding is correct, and I agree that this limits managed schema > functionality. > > Under SolrCloud, all Solr nodes participating in a collection bound to a > configset with a managed schema keep a watch on the corresponding schema ZK > node. In my testing (on my laptop), when the managed schema is written to > ZK, the other nodes are notified very quickly (single-digit milliseconds) > and immediately download and start parsing the schema. Incoming requests > are bound to a snapshot of the live schema at the time they arrive, so > there is a window of time between initial posting to ZK and swapping out > the schema after parsing. Different loads on, and/or different network > latentcy between ZK and each participating node can result in varying > latencies before all nodes are in sync. > > For Schema API users, delaying a couple of seconds after adding fields > before using them should workaround this problem. While not ideal, I think > schema field additions are rare enough in the Solr collection lifecycle > that this is not a huge problem. > > For schemaless users, the picture is worse, as you noted. Immediate > distribution of documents triggering schema field addition could easily > prove problematic. Maybe we need a schema update blocking mode, where > after the ZK schema node watch is triggered, all new request processing is > halted until the schema is finished downloading/parsing/swapping out? Can > you make an issue, Greg? (Such a mode should help Schema API users too.) > > Thanks, > Steve > > On Jun 3, 2014, at 8:06 PM, Gregory Chanan <[email protected]> wrote: > > > I'm trying to determine if the Managed Schema functionality works with > SolrCloud, and AFAICT the integration seems pretty limited. > > > > The issue I'm running into is variants of the issue that schema changes > are not pushed to all shards/replicas synchronously. So, for example, I > can make the following two requests: > > 1) add a field to the collection on server1 using the Schema API > > 2) add a document with the new field, the document is routed to a core > on server2 > > > > Then, there appears to be a race between when the document is processed > by the core on server2 and when the core on server2, via the > ZkIndexSchemaReader, gets the new schema. If the document is processed > first, I get a 400 error because the field doesn't exist. This is easily > reproducible by adding a sleep to the ZkIndexSchemaReader's processing. > > > > I hit a similar issue with Schemaless: the distributed request handler > sends out the document updates, but there is no guarantee that the other > shards/replicas see the schema changes made by the update.chain. > > > > Is my understanding correct? Is this expected? > > > > Thanks, > > Greg > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
