Wow Ilan! This is very comprehensive! Regarding your list of potential bugs, I can confirm that at least two of those are accurate and I've seen them in production. We found work-arounds and got distracted before creating JIRAs, and this was a great reminder of those war stories.
I think this would be a great addition to dev-docs/ and a link from the Overseer Javadoc to this file would be reasonable. I hope to have time to dive into this deeper next week. Mike On Thu, Apr 23, 2020 at 2:36 PM David Smiley <dsmi...@apache.org> wrote: > Thanks Ilan! > > I especially love the lead section "Overseer: from queues to state" with > the diagram. Indeed, this is the documentation I (we?) wished already > existed. I'd like to try to ensure this part of the document is more > tightly associated with our project for others to see. > > This is "developer documentation". Cassandra: I see you created > solr/dev-docs/ and I suppose this would best belong there? Mark Miller had > tried Confluence. Pros/cons there. I want to ensure readers of the code > in Overseer (and maybe other key class or two) notice this dev > documentation. Should I add a http link to the GitHub location of the dev > doc markdown, or do you recommend something else? > > For the rest of the doc, there are problem call-outs (picture of a > triangle hazard with an exclamation point) -- readers can't miss them. I > hope those of us that know SolrCloud internals best can look at those > points closer and maybe file JIRA issues. That isn't me, honestly. > > I've been thinking that some of these problems might best be fixed by > larger architectural changes rather than incrementally fixing a design with > substantial weaknesses (and tech-debt complexities). Noble/Ishan's > SOLR-13951 <https://issues.apache.org/jira/browse/SOLR-13951> issue > "Avoid replica state updates to state.json" will help a lot but there will > be much more to be done to address Solr's over-reliance on the Overseer to > accomplish collection/cluster management. This is something I want to > contemplate more. I hope Curator recipes may be a source of inspiration, > which I plan to review this weekend. > > (note: I work with Ilan) > ~ David > > On Tue, Apr 21, 2020 at 5:06 PM Ilan Ginzburg <ilans...@gmail.com> wrote: > >> Hello Solr devs, >> >> This is my first post here. I work at Salesforce in France, we're >> adopting SolrCloud and we need it to scale more than it currently >> does. >> >> I've looked at Overseer and documented my understanding. I'm sharing >> the result, it might help others and is a way to get feedback (I might >> have misunderstood some things) and/or collaboration on continuing >> documenting the implementation. Basically I started writing the doc I >> wanted to find. >> >> In the process, I believe I've identified what may be a few bugs >> (there's a section listing them at the beginning). I've found these by >> reading code (not running code), so take with a grain of salt. >> I plan to file Jiras for those bugs that do seem real and are >> important enough, and then also start working on some to help >> fix/improve. >> >> >> https://docs.google.com/document/d/1KTHq3noZBVUQ7QNuBGEhujZ_duwTVpAsvN3Nz5anQUY/ >> >> This is WIP. Please do not hesitate to provide feedback/leave comments. >> >> Thanks, >> Ilan >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>