Re: First class support for node roles

Ishan Chattopadhyaya Tue, 02 Nov 2021 08:21:49 -0700

Are there any unaddressed outstanding concerns that we should hold up the
SIP for?


On Mon, 1 Nov, 2021, 10:31 pm Ishan Chattopadhyaya, <
ichattopadhy...@gmail.com> wrote:

> >> Agree. However, I disagree with ideas where "query analysis" has a role
>> of its own. Where would that lead us to? Separate roles for
>>
> >> nodes that do "faceting" or "spell correction" etc.? But anyway, that
>> is for discussion when we add future roles. This is beyond this SIP.
>>
>
> > I am not asking you to implement every possible role of course :). As a
> note I know a company that is running an entire separate
> > cluster to offload and better serve highlighting on a subset of large
> docs, so YES I think there are people who may want such fine grained
> control.
>
> Cool, I think we can discuss adding any additional roles (for
> highlighting?) on a case by case basis at a later point.
>
>
> On Mon, Nov 1, 2021 at 10:25 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> > Boiling it down the idea I'm proposing is that roles required for back
>> compatibility get explicitly added on startup, if not by the user then by
>> the code. This is more flexible than assuming that no role means every
>> role, because then every new feature that has a role will end up on legacy
>> clusters which are also not back compatible.
>>
>> +1, I totally agree. I even said so, when I said: "This is why I was
>> advocating that 1) we assume the "data" as a default, 2) not assume
>> overseer to be implicitly defined (because of the way overseer role is
>> written today), 3) not assume any future roles to be true by default."
>>
>> So, basically, I'm proposing that the "roles required for back
>> compatibility" (that should be explicitly added on startup) be just the
>> ["data"] role, and not the "overseer" role (due to the way overseer role is
>> currently defined, i.e. it is "preferred overseer").
>>
>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <gus.h...@gmail.com> wrote:
>>
>>> Very sorry don't mean to sound offended, Frustrated yes offended no
>>> :)... the most difficult thing about communication is the illusion it has
>>> occurred :)
>>>
>>> If you read back just a few emails you'll see where I talk about roles
>>> being applied on startup. Boiling it down the idea I'm proposing is that
>>> roles required for back compatibility get explicitly added on startup, if
>>> not by the user then by the code. This is more flexible than assuming that
>>> no role means every role, because then every new feature that has a role
>>> will end up on legacy clusters which are also not back compatible.
>>>
>>> There are points where I said all roles rather than back compatibility
>>> roles because I was thinking about back compatibility specifically, but you
>>> can't know that if I don't say that can you :).
>>>
>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
>>>> > If you read more closely, my way can provide full back compatibility.
>>>> To say or imply it doesn't isn't helping. Perhaps you need to re-read?
>>>>
>>>> I understand e-mails are frustrating, and I'm trying my best. Please
>>>> don't be offended, and kindly point me to the exact part you want me to
>>>> re-read.
>>>>
>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <gus.h...@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan Chattopadhyaya <
>>>>> ichattopadhy...@gmail.com> wrote:
>>>>>
>>>>>> >    Positive - They denote the existence of a capability
>>>>>>
>>>>>> Agree, the SIP already reflects this.
>>>>>>
>>>>>> >   Absolute - Absence/Presence binary identification of a
>>>>>> capability; no implications, no assumptions
>>>>>>
>>>>>> Disagree, we need backcompat handling on nodes running without any
>>>>>> roles. There has to be an implicit assumption as to what roles are those
>>>>>> nodes assumed to have. My proposal is that only the "data" role be 
>>>>>> assumed,
>>>>>> but not the "overseer" role. For any future roles ("coordinator",
>>>>>> "zookeeper" etc.), this decision as to what absence of any role implies
>>>>>> should be left to the implementation of that future role. Documentation
>>>>>> should reflect clearly about these implicit assumptions.
>>>>>>
>>>>>>
>>>>> If you read more closely, my way can provide full back compatibility.
>>>>> To say or imply it doesn't isn't helping. Perhaps you need to re-read?
>>>>>
>>>>>
>>>>>> >    Focused - Do one thing per role
>>>>>>
>>>>>> Agree. However, I disagree with ideas where "query analysis" has a
>>>>>> role of its own. Where would that lead us to? Separate roles for nodes 
>>>>>> that
>>>>>> do "faceting" or "spell correction" etc.? But anyway, that is for
>>>>>> discussion when we add future roles. This is beyond this SIP.
>>>>>>
>>>>>>
>>>>> I am not asking you to implement every possible role of course :). As
>>>>> a note I know a company that is running an entire separate cluster to
>>>>> offload and better serve highlighting on a subset of large docs, so YES I
>>>>> think there are people who may want such fine grained control.
>>>>>
>>>>>
>>>>>> >    Accessible - It should be dead simple to determine the members
>>>>>> of a role, avoid parsing blobs of json, avoid calculating implications,
>>>>>> avoid consulting other resources after listing nodes with the role
>>>>>>
>>>>>> Agree. I'm open to any implementation details that make it easy.
>>>>>> There should be a reasonable API to return these node roles, with ability
>>>>>> to filter by role or filter by node.
>>>>>>
>>>>>> >    Independent - One role should not require other roles to be
>>>>>> present
>>>>>>
>>>>>> Do we need to have this hard and fast requirement upfront? There
>>>>>> might be situations where this is desirable. I feel we can discuss on a
>>>>>> case by case basis whenever a future role is added.
>>>>>>
>>>>>> >    Persistent - roles should not be lost across reboot
>>>>>>
>>>>>> Agree.
>>>>>>
>>>>>> >    Immutable - roles should not change while the node is running
>>>>>>
>>>>>> Agree
>>>>>>
>>>>>> >    Lively - A node with a capability may not be presently providing
>>>>>> that capability.
>>>>>>
>>>>>> I don't understand, can you please elaborate?
>>>>>>
>>>>>
>>>>>
>>>>> Specifically imagine the case where there are 100 nodes:
>>>>> 1-100 ==> DATA
>>>>> 101-103 ==> OVERSEER
>>>>> 104-106 ==> ZOOKEEPER
>>>>>
>>>>> But you won't have 3 overseers... you'll want only one of those to be 
>>>>> *providing
>>>>> *overseer functionality and the other two to be *capable*, but not
>>>>> providing (so that if the current overseer goes down a new one can be
>>>>> assigned).
>>>>>
>>>>> Then you decide you'd ike 5 Zookeepers. You start nodes 107-108 with
>>>>> that role, but you probably want to ensure that zookeepers require some
>>>>> sort of command for them to actually join the zookeeper cluster (i.e.
>>>>> /admin?action=ZKADD&nodes=node107,node18) ... to do that the nodes need to
>>>>> be up. But oh look I typoed 108... we want that to fail... how? because 18
>>>>> does not have the *capability* to become a zookeeper.
>>>>>
>>>>>
>>>>>>
>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan Chattopadhyaya <
>>>>>> ichattopadhy...@gmail.com> wrote:
>>>>>>
>>>>>>> > Ilan: A node not having node.roles defined should be assumed to
>>>>>>> have all roles. Not only data. I don't see a reason to special case this
>>>>>>> one or any role.
>>>>>>> > Gus: There should be no "assumptions" Nothing to figure out. A
>>>>>>> node has a role or not. For back compatibility reasons, all roles would 
>>>>>>> be
>>>>>>> assumed on startup if none specified.
>>>>>>> > Jan: No role == all roles. Explicit list of roles = exactly those
>>>>>>> roles.
>>>>>>>
>>>>>>> Problem with this approach is mainly to do with backcompat.
>>>>>>>
>>>>>>> *1. Overseer backcompat:*
>>>>>>> If we don't make any modifications to how overseer works and adopt
>>>>>>> this approach (as quoted), then imagine this situation:
>>>>>>>
>>>>>>> Solr1-100: No roles param (assumed to be "data,overseer").
>>>>>>> Solr101: -Dnode.roles=overseer (intention: dedicated overseer)
>>>>>>>
>>>>>>> User wants this node Solr101 to be a dedicated overseer, but for
>>>>>>> that to happen, he/she would need to restart all the data nodes with
>>>>>>> -Dnode.roles=data. This will cause unnecessary disruption to running
>>>>>>> clusters where a dedicated overseer is needed. Keep in mind, if a user
>>>>>>> needs a dedicated overseer, he's likely in an emergency situation and
>>>>>>> restarting the whole cluster might not be viable for him/her.
>>>>>>>
>>>>>>> *2. Future roles might not be compatible with this "assumed to have
>>>>>>> all roles" idea:*
>>>>>>> Take the proposed "zookeeper" role for example. Today, regular nodes
>>>>>>> are not supposed to have embedded ZK running on them. By introducing 
>>>>>>> this
>>>>>>> artificial limitation ("assumed to have all roles"), we constrain 
>>>>>>> adoption
>>>>>>> of all future roles to necessarily require a full cluster restart.
>>>>>>>
>>>>>>> Keep in mind newer Solr versions can introduce new capabilities and
>>>>>>> roles. Imagine we have a role that is defined in a new Solr version (and
>>>>>>> there's functionality to go with that role), and user upgrades to that
>>>>>>> version. However, his/her nodes all were started with no node.roles 
>>>>>>> param.
>>>>>>> Hence, if those nodes are "assumed to have all roles", then just by 
>>>>>>> virtue
>>>>>>> of upgrading to this new version, new capabilities will be turned on for
>>>>>>> the entire cluster, whether or not the user opted for such a capability.
>>>>>>> This is totally undesirable.
>>>>>>>
>>>>>>> > Gus: I actually don't want a coordinator to do more work, I would
>>>>>>> prefer small focused roles with names that accurately describe their
>>>>>>> function. In that light, COORDINATOR might be too nebulous. How about
>>>>>>> AGREGATOR role? (what I was thinking of would better be called a
>>>>>>> QUERY_ANALYSIS role)
>>>>>>>
>>>>>>> If you want to do specific things like query analysis or query
>>>>>>> aggregation or bulk indexing etc, all of those can be done on 
>>>>>>> COORDINATOR
>>>>>>> nodes (as is the case in ElasticSearch). Having tens of of " small 
>>>>>>> focused
>>>>>>> roles" defined as first class concepts would be confusing to the user. 
>>>>>>> As a
>>>>>>> remedy to your situation where you want the coordinator role to also do
>>>>>>> query-analysis for shards, one possible solution is to send such a 
>>>>>>> query to
>>>>>>> a coordinator node with a parameter like 
>>>>>>> "coordinator.query_analysis=true",
>>>>>>> and then the coordinator, instead of blindly hitting remote shards, also
>>>>>>> does some extra work on behalf of the shards.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan Chattopadhyaya <
>>>>>>> ichattopadhy...@gmail.com> wrote:
>>>>>>>
>>>>>>>> > If we make collections role-aware for example (replicas of that
>>>>>>>> collection can only be
>>>>>>>> > placed on nodes with a specific role, in addition to the other
>>>>>>>> role based constraints),
>>>>>>>> > the set of roles should be user extensible and not fixed.
>>>>>>>> > If collections are not role aware, the constraints introduced by
>>>>>>>> roles apply to all collections
>>>>>>>> > equally which might be insufficient if a user needs for example a
>>>>>>>> heavily used collection to
>>>>>>>> > only be placed on more powerful nodes.
>>>>>>>>
>>>>>>>> I feel node roles and role-aware collections are orthogonal topics.
>>>>>>>> What you describe above can be achieved by the autoscaling+replica
>>>>>>>> placement framework where the placement plugins take the node roles as 
>>>>>>>> one
>>>>>>>> of the inputs.
>>>>>>>>
>>>>>>>> > It does impact the design from early on: the set of roles need to
>>>>>>>> be expandable by a user
>>>>>>>> > by creating a collection with new roles for example (consumed by
>>>>>>>> placement plugins) and be
>>>>>>>> > able to start nodes with new (arbitrary) roles. Should such roles
>>>>>>>> follow some naming syntax to
>>>>>>>> > differentiate them from built in roles? To be able to fail on
>>>>>>>> typos on roles - that otherwise can be
>>>>>>>> > crippling and hard to debug. This implies in any case that the
>>>>>>>> current design can't assume all
>>>>>>>> > roles are known at compile time or define them in a Java enum.
>>>>>>>>
>>>>>>>> I think this should be achieved by something different from roles.
>>>>>>>> Something like node *labels* (user defined) which can then be used
>>>>>>>> in a replica placement plugin to assign replicas. I see roles as more
>>>>>>>> closely associated with kinds of functionality a node is designated 
>>>>>>>> for.
>>>>>>>> Therefore, I feel that replica placements and user defined node labels 
>>>>>>>> is
>>>>>>>> out of scope for this SIP. It can be added later in a separate SIP, 
>>>>>>>> without
>>>>>>>> being at odds with this proposal.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl <jan....@cominvent.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg <ilans...@gmail.com>:
>>>>>>>>> > A node not having node.roles defined should be assumed to have
>>>>>>>>> all roles. Not only data. I don't see a reason to special case this 
>>>>>>>>> one or
>>>>>>>>> any role.
>>>>>>>>>
>>>>>>>>> +1, make it simple and transparent. No role == all roles. Explicit
>>>>>>>>> list of roles = exactly those roles.
>>>>>>>>>
>>>>>>>>> > (Gus) See my comment above, but maybe preference is something
>>>>>>>>> handled as a feature of the role rather than via role designation?
>>>>>>>>>
>>>>>>>>> Yea, we always need an overseer, so that feature can decide to use
>>>>>>>>> its list of nodes as a preference if it so chooses.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Aside: I think it makes it easier if we always prefix Solr
>>>>>>>>> env.vars and sys.props with "SOLR_" or "solr.", i.e. 
>>>>>>>>> -Dsolr.node.roles=foo.
>>>>>>>>> That way we can get away from having to have explicit code in 
>>>>>>>>> bin/solr,
>>>>>>>>> bin/solr.cmd and SolrCLI to manage every single property. Instead we 
>>>>>>>>> can
>>>>>>>>> parse all ENVs and Props with the solr prefix in our bootstrap code. 
>>>>>>>>> And we
>>>>>>>>> can by convention allow e.g. docker run -e SOLR_NODE_ROLES=foo solr:9 
>>>>>>>>> and
>>>>>>>>> it would be the same ting...
>>>>>>>>>
>>>>>>>>> Jan
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
>>>>>>>>> For additional commands, e-mail: dev-h...@solr.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>>>
>>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>

Re: First class support for node roles

Reply via email to