[ 
https://issues.apache.org/jira/browse/SOLR-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446299#comment-16446299
 ] 

David Smiley commented on SOLR-12256:
-------------------------------------

Patch notes:
 * ZkStateReader: AliasesManager.update() add call to ZooKeeper.sync()
 * SetAliasPropCmd:
 ** eagerly call AliasesManager.update().  Setting alias props won't be called 
in high frequency so I think this is ok.
 ** Improve efficiency by using the the overloaded method of 
.applyModificationAndExportToZk that takes a Map instead of making 
modifications one at a time.
 * AliasIntegrationTest:
 ** comment away BadApple annotations.  I'm looking at these things.
 ** Minor inlining of needless UnaryOperator local vars
 * CreateRoutedAliasTest: comment away BadApple annotations.  I'm looking at 
these things.
 * TimeRoutedAliasUpdateProcessorTest: added more diagnostic logging and 
cleaned up some indentation and other minor stuff.

*I'm going to commit this right away and keep the issue open a bit to see the 
effects (hopefully no Jenkins failures).*

Furthermore, I looked at two separate TimeRoutedAliasUpdateProcessorTest 
failures by Jenkins.  These failures I'm certain have (almost) nothing to do 
with the above.

(1) Timed out creating the collection {{alias + "_2017-10-23"}} which is at a 
point before any actual TRA stuff is happening.  I looked at the logs carefully 
and I have no idea why it timed out.  It seems the collection was created 
(shards were being made) then a long pause of ~165 seconds and then the timeout 
failure.  I'll keep an eye on this... I'm keeping the logs to compare.

(2) After the comment "manipulate the config" we configure the collection 
created before this step.  We use the V2 API.  But when it got to Solr, the 
node receiving it didn't know about this collection and so it failed.  Note 
that the V1 API will not immediately fail, it will internally call 
AliasesManager.update() and then do a retry.  Wether or not an alias is 
actually being referenced, this has the effect of giving the V1 API a little 
bit more time to see the collection or alias.  I'll file a separate issue about 
this.

> Aliases and eventual consistency (should use sync())
> ----------------------------------------------------
>
>                 Key: SOLR-12256
>                 URL: https://issues.apache.org/jira/browse/SOLR-12256
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>         Attachments: SOLR-12256.patch
>
>
> ZkStateReader.AliasesManager.update() reads alias info from ZK into the 
> ZkStateReader.  This method is called in ~5 places (+2 for tests).  In at 
> least some of these places, the caller assumes that the alias info is 
> subsequently up to date when in fact this might not be so since ZK is allowed 
> to return a stale value.  ZooKeeper.sync() can be called to force an up to 
> date value.  As with sync(), AliasManager.update() ought not to be called 
> aggressively/commonly, only in certain circumstances (e.g. _after_ failing to 
> resolve stuff that would otherwise return an error).
> And related to this eventual consistency issue, SetAliasPropCmd will throw an 
> exception if the alias doesn't exist.  Fair enough, but sometimes (as seen in 
> some tests), the node receiving the command to update Alias properties is 
> simply "behind"; it does not yet know about an alias that other nodes know 
> about.  I believe this is the cause of some failures in AliasIntegrationTest; 
> perhaps others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to