Re: Focusing on single implementations of core logic

Justin Sweeney Wed, 21 Sep 2022 09:55:06 -0700

Hey all,

I'm a long time Solr user/developer, but only recently joined the dev
mailing list for Solr so it is a pleasure to interact with you all.

We, at FullStory, working with Ishan and Noble, sponsored the Per Replica
State implementation and are using it currently. We are running large
clusters in production with a high number of collections/cores and have
historically faced challenges with handling updates to state.json with
events like node restarts at that scale. The size of the state.json file
and coordinating all operations through overseer was not working well with
many collections across many nodes, leading to developing the Per Replica
State model. There are definite improvements that can be made in the code
for PRS, we've actually made quite a few improvements on our Solr fork this
year that we would still need to upstream.

We've been loosely following the Distributed State Update concept, but
haven't spent much time to understand pros/cons of that vs PRS. We'd
definitely be interested in working with the community to share more about
PRS and understand other efforts with the goal of pushing towards a more
streamlined implementation. I'm not sure how the community has handled this
in the past, if there is a small group we wanted to put together for some
synchronous discussions, we could present on PRS and have a representative
share about the Distributed State Update concept. If we want something more
async, I can work with the team at FullStory to write up more detail on PRS
to share out with the community and start to build some buy-in.

I'm in agreement that distributed state is complicated as is, so working
towards cleaner code here is important, so I'm interested to hear how we
can help move forward.

Justin

On Wed, Sep 21, 2022 at 11:44 AM Houston Putman <hous...@apache.org> wrote:

> Hey everyone,
>
> We've seen some interesting developments over the last 2 years in the way
> that Solr state and distributed logic is handled. Notably we've seen the
> introduction of PerReplicaStates (PRS) and the Distributed State Updates
> (no overseer).
>
> I think for the health of our code and future maintainability, we should
> really look to decide on what implementations we want to use for State
> management and Distributed operations. Basically do we want to adopt or
> abandon PRS/Distributed State Updates. Note that these are separate
> concepts, so the decision on each will be separate.
>
> I bring this up because I see PRS a lot through the code and it feels like
> the code is too separate from the original way of managing state. There is
> a lot of "if (prsEnabled)" logic throughout the core, and its very hard to
> understand how PRS actually works with this logic spread all over the
> place. If we want to move forward with PRS, then we hopefully would be able
> to consolidate the logic.
>
> I don't see the Distributed State Update logic nearly as much, but I
> imagine our code can only get cleaner with one implementation versus two.
>
> This is just my opinion, let me know what y'all think about making
> decisions or going forward with the status quo.
>
> - Houston
>

Re: Focusing on single implementations of core logic

Reply via email to