[
https://issues.apache.org/jira/browse/SOLR-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226414#comment-16226414
]
Shalin Shekhar Mangar commented on SOLR-9440:
---------------------------------------------
I found the root cause.
The bug is in {{ZkStateReader.removeCollectionStateWatcher}} which only removes
the collection from the collection's watch list i.e. collectionWatches map.
Since ZK does not have a way to remove a watch, the watch object is fired again
when the collection changes. Now, there is code in StateWatcher's
refreshAndWatch method which is supposed to evict the cached DocCollection
object from watchedCollectionStates if the collection is no more in the
collectionWatches map. However, that code never gets executed because the
StateWatcher's process method returns early if the collection is not in
collectionWatches list. So a cached DocCollection reference that is neither
lazy nor watched is left behind.
> ZkStateReader on a client can cache collection state and never refresh it
> -------------------------------------------------------------------------
>
> Key: SOLR-9440
> URL: https://issues.apache.org/jira/browse/SOLR-9440
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: Shalin Shekhar Mangar
> Assignee: Shalin Shekhar Mangar
> Fix For: 6.7, 7.0, master (8.0)
>
>
> I saw this while writing a test case for SOLR-9438. The collection1
> collection which was in stateFormat=2 was somehow caching the
> CloudSolrClient's ZkStateReader such that the returned cluster state
> contained the collection state. However this collection was neither watched
> nor lazy so any call to waitForRecoveriesToFinish would see stale state and
> loop until timeout.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]