Ah I see. I assumed it would be written back.  I'm generally nervous about
the idea of having in-memory state that is not reflected in zookeeper for
an indeterminate period of time...

On Mon, Apr 26, 2021 at 11:31 AM David Smiley <[email protected]> wrote:

> Gus: state.json is read on startup.  In my proposal, the in-memory JSON
> for it would be augmented with the configSet but I proposed no new
> write-back to ZK.  So if there is some reason to change the state (e.g.
> replica state change) then the collection would be upgraded.  Conceptually,
> writing back immediately makes sense and would allow one to reason that the
> collections are updated right away automatically, but I'm not yet sure how
> complicated guaranteeing this would be.  Nazerke and I should explore this
> more to see.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Apr 20, 2021 at 6:13 PM Gus Heck <[email protected]> wrote:
>
>> hmm, does state.json get read and (thus upgraded) when a node hosting the
>> collection (re)starts? Would this in effect be an upgrade on startup?
>>
>> On Tue, Apr 20, 2021 at 5:28 PM David Smiley <[email protected]> wrote:
>>
>>> In the following issue,
>>> https://issues.apache.org/jira/browse/SOLR-14341
>>> Nazerke (my colleague) is working on moving a collection's "configName"
>>> (configSet) into state.json where it should have been all along.  Better
>>> late than never.  This is targeting 9.0.  This email is largely about
>>> migration / backwards-compatibility.
>>>
>>> The current location of a collection's configSet name is read by
>>> ZkStateReader.readConfigSetName(collection) which reads JSON stored at the
>>> ZK path "/collections/<COLNAME>" which is the containing node for
>>> SolrCloud's information about the collection (i.e. it contains state.json
>>> etc.).  Example data: {"configName":"_default"}.  In case you didn't know,
>>> ZK intermediate nodes can contain data just like leaf nodes, unlike a file
>>> system.
>>>
>>> Instead, we want it retrievable by a new method
>>> DocCollection.getConfigSet reflecting the storage of state.json which could
>>> have a new name-value pair at the top: "configSet".
>>>
>>> So how do we do this transition?  How about this: Whenever SolrCloud
>>> reads state.json, it detects the absence of configSet and it inserts it on
>>> the fly, reading the old location.  This will incur a performance overhead
>>> but it's transient during an upgrade to Solr 9.  To ensure that all
>>> collections are upgraded (and thus stop incurring a penalty), we can
>>> provide a trivial bash script that reads all existing collections and loops
>>> over them to call MODIFYCOLLECTION to set the configSet to whatever it is
>>> currently.  Creating/modifying a collection will ensure that the configSet
>>> name is stored in the old place and new place.
>>>
>>> Then we remove writing to the old place in Solr 10.  Or maybe Solr 9
>>> doesn't write to the old location, provided that during a live upgrade you
>>> don't create or modify collections or associations with configSets because
>>> that could confuse Solr 8 nodes?  If we go with this, a
>>> MODIFYCOLLECTION command could remove the old data if it's present.
>>>
>>> AFAICT, SolrJ CloudSolrClient doesn't really care about this matter,
>>> thankfully.
>>>
>>> WDYT folks?
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to