[
https://issues.apache.org/jira/browse/SOLR-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143846#comment-14143846
]
Timothy Potter commented on SOLR-6249:
--------------------------------------
This mechanism is mainly a convenience for the client to not have to poll the
zk version from all replicas themselves. If timeout occurs, then either A) one
or more of the replicas couldn't process the update or B) one or more of the
replicas was just being really slow. If A, then really the client app can't
really proceed safely without resolving the root cause. Not really sure what to
do about this without going down the path of having a distributed transaction
that allows us to rollback updates if any replicas fail.
If B, the client can wait more, but then they would have to poll all the
replicas themselves, which makes client's implement this same polling solution
on their side as well, thus not very convenient.
One thing would could do to help make it more convenient for dealing with B and
even possibly A is to use the solution I proposed for SOLR-6550 to pass back
the URLs of the replicas that timed out using the extended exception metadata.
That at least narrows the scope for the client but still inconvenient.
Alternatively, async would work but at some point, doesn't the client have to
give up polling? Hence we're back to effectively having a timeout. I took this
ticket to mean that a client doesn't want to proceed with more updates until it
knows all cores have seen the current update, so async seems to just move the
problem out to the client.
I'm happy to implement the async approach but from where I sit now, I think we
should build distributed 2-phase commit transaction support into managed schema
as it will be useful going forward for managed config. That way, clients can
make a change and then be certain it was either applied entirely or not at all
and their cluster remains in a consistent state. This of course would only be
applied to schema and config changes so I'm not talking about distributed
transactions for Solr in general.
> Schema API changes return success before all cores are updated
> --------------------------------------------------------------
>
> Key: SOLR-6249
> URL: https://issues.apache.org/jira/browse/SOLR-6249
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis, SolrCloud
> Reporter: Gregory Chanan
> Assignee: Timothy Potter
> Attachments: SOLR-6249.patch, SOLR-6249.patch
>
>
> See SOLR-6137 for more details.
> The basic issue is that Schema API changes return success when the first core
> is updated, but other cores asynchronously read the updated schema from
> ZooKeeper.
> So a client application could make a Schema API change and then index some
> documents based on the new schema that may fail on other nodes.
> Possible fixes:
> 1) Make the Schema API calls synchronous
> 2) Give the client some ability to track the state of the schema. They can
> already do this to a certain extent by checking the Schema API on all the
> replicas and verifying that the field has been added, though this is pretty
> cumbersome. Maybe it makes more sense to do this sort of thing on the
> collection level, i.e. Schema API changes return the zk version to the
> client. We add an API to return the current zk version. On a replica, if
> the zk version is >= the version the client has, the client knows that
> replica has at least seen the schema change. We could also provide an API to
> do the distribution and checking across the different replicas of the
> collection so that clients don't need ot do that themselves.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]