Re: First class support for node roles

Gus Heck Tue, 02 Nov 2021 14:20:45 -0700

I think there are things not yet accounted for. Time I spent yesterday is
biting me today. Pls give a couple days.


On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski <gerlowsk...@gmail.com>
wrote:

> Hey Ishan,
>
> I appreciate you writing up the SIP!  Here's some notes/questions I
> had as I was reading through your writeup and this mail thread.
> ("----" separators between thoughts, hopefully that helps.)
>
> ----
>
> I'll add my vote to what Jan, Gus, Ilan, and Houston already
> suggested: roles should default to "all-on".  I see the downsides
> you're worried about with that approach (esp. around 'overseer'), but
> they may be mitigatable, at least in part.
>
> > [mail thread] User wants this node Solr101 to be a dedicated overseer,
> but for that to happen, he/she would need to restart all the data nodes
> with -Dnode.roles=data
>
> Sure, if roles can only be specified at startup.  But that may be a
> self-imposed constraint.
>
> An API to change a node's roles would remove the need for a restart
> and make it easy for users to affect the semantics they want.  You
> decided you want a dedicated overseer N nodes into your cluster
> deployment?  Deploy node 'N' with the 'overseer', and toggle the
> overseer role off on the remainder.
>
> Now, I understand that you don't want roles to change at runtime, but
> I haven't seen you get much into "why", beyond saying "it is very
> risky to have nodes change roles while they are up and running."  Can
> you expand a bit on the risks you're worried about?  If you're
> explicit about them here maybe someone can think of a clever way to
> address them?
>
> > Hence, if those nodes are "assumed to have all roles", then just by
> virtue of upgrading to this new version, new capabilities will be turned on
> for the entire cluster, whether or not the user opted for such a
> capability. This is totally undesirable.
>
> Obviously "roles" refer to much bigger chunks of functionality than
> usual, so in a sense defaulting roles on is scarier.  But in a sense
> you're describing something that's an inherent part of software
> releases.  Releases expose new features that are typically on by
> default.  A new default-on role in 9.1 might hurt a user, but there's
> no fundamental difference between that and a change to backups or
> replication or whatever in the same release.
>
> I don't mean to belittle the difference in scope - I get your concern.
> But IMO this is something to address with good release notes and
> documentation.  Designing for admins who don't do even cursory
> research before an upgrade ties both our hands behind our back as a
> project.
>
> ----
>
> > [SIP] Internal representation in ZK ... Implementation details like
> these can be fleshed out in the PR
>
> IMO this is important enough to flush out as part of the SIP, at least
> in broad strokes.  It affects backcompat, SolrJ client design, etc.
>
> ----
>
> > [SIP] GET /api/cluster/roles?node=node1
>
> Woohoo - way to include a v2 API definition!
>
> AFAIR, the v2 API has a /nodes path defined - I wonder whether "GET
> /nodes/someNode/roles" wouldn't be a more intuitive endpoint for the
> "get the roles this node has" functionality.  Though I leave that for
> your consideration.
>
> ----
>
> Looking forward to your responses and seeing the SIP progress!  It's a
> really cool, promising idea IMO.
>
> Best,
>
> Jason
>
> On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya
> <ichattopadhy...@gmail.com> wrote:
> >
> > Are there any unaddressed outstanding concerns that we should hold up
> the SIP for?
> >
> > On Mon, 1 Nov, 2021, 10:31 pm Ishan Chattopadhyaya, <
> ichattopadhy...@gmail.com> wrote:
> >>>
> >>> >> Agree. However, I disagree with ideas where "query analysis" has a
> role of its own. Where would that lead us to? Separate roles for
> >>>
> >>> >> nodes that do "faceting" or "spell correction" etc.? But anyway,
> that is for discussion when we add future roles. This is beyond this SIP.
> >>
> >>
> >> > I am not asking you to implement every possible role of course :). As
> a note I know a company that is running an entire separate
> >> > cluster to offload and better serve highlighting on a subset of large
> docs, so YES I think there are people who may want such fine grained
> control.
> >>
> >> Cool, I think we can discuss adding any additional roles (for
> highlighting?) on a case by case basis at a later point.
> >>
> >>
> >> On Mon, Nov 1, 2021 at 10:25 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
> >>>
> >>> > Boiling it down the idea I'm proposing is that roles required for
> back compatibility get explicitly added on startup, if not by the user then
> by the code. This is more flexible than assuming that no role means every
> role, because then every new feature that has a role will end up on legacy
> clusters which are also not back compatible.
> >>>
> >>> +1, I totally agree. I even said so, when I said: "This is why I was
> advocating that 1) we assume the "data" as a default, 2) not assume
> overseer to be implicitly defined (because of the way overseer role is
> written today), 3) not assume any future roles to be true by default."
> >>>
> >>> So, basically, I'm proposing that the "roles required for back
> compatibility" (that should be explicitly added on startup) be just the
> ["data"] role, and not the "overseer" role (due to the way overseer role is
> currently defined, i.e. it is "preferred overseer").
> >>>
> >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <gus.h...@gmail.com> wrote:
> >>>>
> >>>> Very sorry don't mean to sound offended, Frustrated yes offended no
> :)... the most difficult thing about communication is the illusion it has
> occurred :)
> >>>>
> >>>> If you read back just a few emails you'll see where I talk about
> roles being applied on startup. Boiling it down the idea I'm proposing is
> that roles required for back compatibility get explicitly added on startup,
> if not by the user then by the code. This is more flexible than assuming
> that no role means every role, because then every new feature that has a
> role will end up on legacy clusters which are also not back compatible.
> >>>>
> >>>> There are points where I said all roles rather than back
> compatibility roles because I was thinking about back compatibility
> specifically, but you can't know that if I don't say that can you :).
> >>>>
> >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
> >>>>>
> >>>>> > If you read more closely, my way can provide full back
> compatibility. To say or imply it doesn't isn't helping. Perhaps you need
> to re-read?
> >>>>>
> >>>>> I understand e-mails are frustrating, and I'm trying my best. Please
> don't be offended, and kindly point me to the exact part you want me to
> re-read.
> >>>>>
> >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <gus.h...@gmail.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> >    Positive - They denote the existence of a capability
> >>>>>>>
> >>>>>>> Agree, the SIP already reflects this.
> >>>>>>>
> >>>>>>> >   Absolute - Absence/Presence binary identification of a
> capability; no implications, no assumptions
> >>>>>>>
> >>>>>>> Disagree, we need backcompat handling on nodes running without any
> roles. There has to be an implicit assumption as to what roles are those
> nodes assumed to have. My proposal is that only the "data" role be assumed,
> but not the "overseer" role. For any future roles ("coordinator",
> "zookeeper" etc.), this decision as to what absence of any role implies
> should be left to the implementation of that future role. Documentation
> should reflect clearly about these implicit assumptions.
> >>>>>>>
> >>>>>>
> >>>>>> If you read more closely, my way can provide full back
> compatibility. To say or imply it doesn't isn't helping. Perhaps you need
> to re-read?
> >>>>>>
> >>>>>>>
> >>>>>>> >    Focused - Do one thing per role
> >>>>>>>
> >>>>>>> Agree. However, I disagree with ideas where "query analysis" has a
> role of its own. Where would that lead us to? Separate roles for nodes that
> do "faceting" or "spell correction" etc.? But anyway, that is for
> discussion when we add future roles. This is beyond this SIP.
> >>>>>>>
> >>>>>>
> >>>>>> I am not asking you to implement every possible role of course :).
> As a note I know a company that is running an entire separate cluster to
> offload and better serve highlighting on a subset of large docs, so YES I
> think there are people who may want such fine grained control.
> >>>>>>
> >>>>>>>
> >>>>>>> >    Accessible - It should be dead simple to determine the
> members of a role, avoid parsing blobs of json, avoid calculating
> implications, avoid consulting other resources after listing nodes with the
> role
> >>>>>>>
> >>>>>>> Agree. I'm open to any implementation details that make it easy.
> There should be a reasonable API to return these node roles, with ability
> to filter by role or filter by node.
> >>>>>>>
> >>>>>>> >    Independent - One role should not require other roles to be
> present
> >>>>>>>
> >>>>>>> Do we need to have this hard and fast requirement upfront? There
> might be situations where this is desirable. I feel we can discuss on a
> case by case basis whenever a future role is added.
> >>>>>>>
> >>>>>>> >    Persistent - roles should not be lost across reboot
> >>>>>>>
> >>>>>>> Agree.
> >>>>>>>
> >>>>>>> >    Immutable - roles should not change while the node is running
> >>>>>>>
> >>>>>>> Agree
> >>>>>>>
> >>>>>>> >    Lively - A node with a capability may not be presently
> providing that capability.
> >>>>>>>
> >>>>>>> I don't understand, can you please elaborate?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Specifically imagine the case where there are 100 nodes:
> >>>>>> 1-100 ==> DATA
> >>>>>> 101-103 ==> OVERSEER
> >>>>>> 104-106 ==> ZOOKEEPER
> >>>>>>
> >>>>>> But you won't have 3 overseers... you'll want only one of those to
> be providing overseer functionality and the other two to be capable, but
> not providing (so that if the current overseer goes down a new one can be
> assigned).
> >>>>>>
> >>>>>> Then you decide you'd ike 5 Zookeepers. You start nodes 107-108
> with that role, but you probably want to ensure that zookeepers require
> some sort of command for them to actually join the zookeeper cluster (i.e.
> /admin?action=ZKADD&nodes=node107,node18) ... to do that the nodes need to
> be up. But oh look I typoed 108... we want that to fail... how? because 18
> does not have the capability to become a zookeeper.
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> > Ilan: A node not having node.roles defined should be assumed to
> have all roles. Not only data. I don't see a reason to special case this
> one or any role.
> >>>>>>>> > Gus: There should be no "assumptions" Nothing to figure out. A
> node has a role or not. For back compatibility reasons, all roles would be
> assumed on startup if none specified.
> >>>>>>>> > Jan: No role == all roles. Explicit list of roles = exactly
> those roles.
> >>>>>>>>
> >>>>>>>> Problem with this approach is mainly to do with backcompat.
> >>>>>>>>
> >>>>>>>> 1. Overseer backcompat:
> >>>>>>>> If we don't make any modifications to how overseer works and
> adopt this approach (as quoted), then imagine this situation:
> >>>>>>>>
> >>>>>>>> Solr1-100: No roles param (assumed to be "data,overseer").
> >>>>>>>> Solr101: -Dnode.roles=overseer (intention: dedicated overseer)
> >>>>>>>>
> >>>>>>>> User wants this node Solr101 to be a dedicated overseer, but for
> that to happen, he/she would need to restart all the data nodes with
> -Dnode.roles=data. This will cause unnecessary disruption to running
> clusters where a dedicated overseer is needed. Keep in mind, if a user
> needs a dedicated overseer, he's likely in an emergency situation and
> restarting the whole cluster might not be viable for him/her.
> >>>>>>>>
> >>>>>>>> 2. Future roles might not be compatible with this "assumed to
> have all roles" idea:
> >>>>>>>> Take the proposed "zookeeper" role for example. Today, regular
> nodes are not supposed to have embedded ZK running on them. By introducing
> this artificial limitation ("assumed to have all roles"), we constrain
> adoption of all future roles to necessarily require a full cluster restart.
> >>>>>>>>
> >>>>>>>> Keep in mind newer Solr versions can introduce new capabilities
> and roles. Imagine we have a role that is defined in a new Solr version
> (and there's functionality to go with that role), and user upgrades to that
> version. However, his/her nodes all were started with no node.roles param.
> Hence, if those nodes are "assumed to have all roles", then just by virtue
> of upgrading to this new version, new capabilities will be turned on for
> the entire cluster, whether or not the user opted for such a capability.
> This is totally undesirable.
> >>>>>>>>
> >>>>>>>> > Gus: I actually don't want a coordinator to do more work, I
> would prefer small focused roles with names that accurately describe their
> function. In that light, COORDINATOR might be too nebulous. How about
> AGREGATOR role? (what I was thinking of would better be called a
> QUERY_ANALYSIS role)
> >>>>>>>>
> >>>>>>>> If you want to do specific things like query analysis or query
> aggregation or bulk indexing etc, all of those can be done on COORDINATOR
> nodes (as is the case in ElasticSearch). Having tens of of " small focused
> roles" defined as first class concepts would be confusing to the user. As a
> remedy to your situation where you want the coordinator role to also do
> query-analysis for shards, one possible solution is to send such a query to
> a coordinator node with a parameter like "coordinator.query_analysis=true",
> and then the coordinator, instead of blindly hitting remote shards, also
> does some extra work on behalf of the shards.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> > If we make collections role-aware for example (replicas of
> that collection can only be
> >>>>>>>>> > placed on nodes with a specific role, in addition to the other
> role based constraints),
> >>>>>>>>> > the set of roles should be user extensible and not fixed.
> >>>>>>>>> > If collections are not role aware, the constraints introduced
> by roles apply to all collections
> >>>>>>>>> > equally which might be insufficient if a user needs for
> example a heavily used collection to
> >>>>>>>>> > only be placed on more powerful nodes.
> >>>>>>>>>
> >>>>>>>>> I feel node roles and role-aware collections are orthogonal
> topics. What you describe above can be achieved by the autoscaling+replica
> placement framework where the placement plugins take the node roles as one
> of the inputs.
> >>>>>>>>>
> >>>>>>>>> > It does impact the design from early on: the set of roles need
> to be expandable by a user
> >>>>>>>>> > by creating a collection with new roles for example (consumed
> by placement plugins) and be
> >>>>>>>>> > able to start nodes with new (arbitrary) roles. Should such
> roles follow some naming syntax to
> >>>>>>>>> > differentiate them from built in roles? To be able to fail on
> typos on roles - that otherwise can be
> >>>>>>>>> > crippling and hard to debug. This implies in any case that the
> current design can't assume all
> >>>>>>>>> > roles are known at compile time or define them in a Java enum.
> >>>>>>>>>
> >>>>>>>>> I think this should be achieved by something different from
> roles. Something like node labels (user defined) which can then be used in
> a replica placement plugin to assign replicas. I see roles as more closely
> associated with kinds of functionality a node is designated for. Therefore,
> I feel that replica placements and user defined node labels is out of scope
> for this SIP. It can be added later in a separate SIP, without being at
> odds with this proposal.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl <
> jan....@cominvent.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg <
> ilans...@gmail.com>:
> >>>>>>>>>> > A node not having node.roles defined should be assumed to
> have all roles. Not only data. I don't see a reason to special case this
> one or any role.
> >>>>>>>>>>
> >>>>>>>>>> +1, make it simple and transparent. No role == all roles.
> Explicit list of roles = exactly those roles.
> >>>>>>>>>>
> >>>>>>>>>> > (Gus) See my comment above, but maybe preference is something
> handled as a feature of the role rather than via role designation?
> >>>>>>>>>>
> >>>>>>>>>> Yea, we always need an overseer, so that feature can decide to
> use its list of nodes as a preference if it so chooses.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Aside: I think it makes it easier if we always prefix Solr
> env.vars and sys.props with "SOLR_" or "solr.", i.e. -Dsolr.node.roles=foo.
> That way we can get away from having to have explicit code in bin/solr,
> bin/solr.cmd and SolrCLI to manage every single property. Instead we can
> parse all ENVs and Props with the solr prefix in our bootstrap code. And we
> can by convention allow e.g. docker run -e SOLR_NODE_ROLES=foo solr:9 and
> it would be the same ting...
> >>>>>>>>>>
> >>>>>>>>>> Jan
> >>>>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> >>>>>>>>>> For additional commands, e-mail: dev-h...@solr.apache.org
> >>>>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> http://www.needhamsoftware.com (work)
> >>>>>> http://www.the111shift.com (play)
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> http://www.needhamsoftware.com (work)
> >>>> http://www.the111shift.com (play)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: First class support for node roles

Reply via email to