Ah I see. I assumed it would be written back. I'm generally nervous about the idea of having in-memory state that is not reflected in zookeeper for an indeterminate period of time...
On Mon, Apr 26, 2021 at 11:31 AM David Smiley <[email protected]> wrote: > Gus: state.json is read on startup. In my proposal, the in-memory JSON > for it would be augmented with the configSet but I proposed no new > write-back to ZK. So if there is some reason to change the state (e.g. > replica state change) then the collection would be upgraded. Conceptually, > writing back immediately makes sense and would allow one to reason that the > collections are updated right away automatically, but I'm not yet sure how > complicated guaranteeing this would be. Nazerke and I should explore this > more to see. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Tue, Apr 20, 2021 at 6:13 PM Gus Heck <[email protected]> wrote: > >> hmm, does state.json get read and (thus upgraded) when a node hosting the >> collection (re)starts? Would this in effect be an upgrade on startup? >> >> On Tue, Apr 20, 2021 at 5:28 PM David Smiley <[email protected]> wrote: >> >>> In the following issue, >>> https://issues.apache.org/jira/browse/SOLR-14341 >>> Nazerke (my colleague) is working on moving a collection's "configName" >>> (configSet) into state.json where it should have been all along. Better >>> late than never. This is targeting 9.0. This email is largely about >>> migration / backwards-compatibility. >>> >>> The current location of a collection's configSet name is read by >>> ZkStateReader.readConfigSetName(collection) which reads JSON stored at the >>> ZK path "/collections/<COLNAME>" which is the containing node for >>> SolrCloud's information about the collection (i.e. it contains state.json >>> etc.). Example data: {"configName":"_default"}. In case you didn't know, >>> ZK intermediate nodes can contain data just like leaf nodes, unlike a file >>> system. >>> >>> Instead, we want it retrievable by a new method >>> DocCollection.getConfigSet reflecting the storage of state.json which could >>> have a new name-value pair at the top: "configSet". >>> >>> So how do we do this transition? How about this: Whenever SolrCloud >>> reads state.json, it detects the absence of configSet and it inserts it on >>> the fly, reading the old location. This will incur a performance overhead >>> but it's transient during an upgrade to Solr 9. To ensure that all >>> collections are upgraded (and thus stop incurring a penalty), we can >>> provide a trivial bash script that reads all existing collections and loops >>> over them to call MODIFYCOLLECTION to set the configSet to whatever it is >>> currently. Creating/modifying a collection will ensure that the configSet >>> name is stored in the old place and new place. >>> >>> Then we remove writing to the old place in Solr 10. Or maybe Solr 9 >>> doesn't write to the old location, provided that during a live upgrade you >>> don't create or modify collections or associations with configSets because >>> that could confuse Solr 8 nodes? If we go with this, a >>> MODIFYCOLLECTION command could remove the old data if it's present. >>> >>> AFAICT, SolrJ CloudSolrClient doesn't really care about this matter, >>> thankfully. >>> >>> WDYT folks? >>> >>> ~ David Smiley >>> Apache Lucene/Solr Search Developer >>> http://www.linkedin.com/in/davidwsmiley >>> >> >> >> -- >> http://www.needhamsoftware.com (work) >> http://www.the111shift.com (play) >> > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
