Re: First class support for node roles

Ishan Chattopadhyaya Mon, 01 Nov 2021 10:02:25 -0700

> >> Agree. However, I disagree with ideas where "query analysis" has a role
> of its own. Where would that lead us to? Separate roles for
>
>> nodes that do "faceting" or "spell correction" etc.? But anyway, that is
> for discussion when we add future roles. This is beyond this SIP.
>


> I am not asking you to implement every possible role of course :). As a
note I know a company that is running an entire separate
> cluster to offload and better serve highlighting on a subset of large
docs, so YES I think there are people who may want such fine grained
control.

Cool, I think we can discuss adding any additional roles (for
highlighting?) on a case by case basis at a later point.


On Mon, Nov 1, 2021 at 10:25 PM Ishan Chattopadhyaya <
[email protected]> wrote:

> > Boiling it down the idea I'm proposing is that roles required for back
> compatibility get explicitly added on startup, if not by the user then by
> the code. This is more flexible than assuming that no role means every
> role, because then every new feature that has a role will end up on legacy
> clusters which are also not back compatible.
>
> +1, I totally agree. I even said so, when I said: "This is why I was
> advocating that 1) we assume the "data" as a default, 2) not assume
> overseer to be implicitly defined (because of the way overseer role is
> written today), 3) not assume any future roles to be true by default."
>
> So, basically, I'm proposing that the "roles required for back
> compatibility" (that should be explicitly added on startup) be just the
> ["data"] role, and not the "overseer" role (due to the way overseer role is
> currently defined, i.e. it is "preferred overseer").
>
> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <[email protected]> wrote:
>
>> Very sorry don't mean to sound offended, Frustrated yes offended no :)...
>> the most difficult thing about communication is the illusion it has
>> occurred :)
>>
>> If you read back just a few emails you'll see where I talk about roles
>> being applied on startup. Boiling it down the idea I'm proposing is that
>> roles required for back compatibility get explicitly added on startup, if
>> not by the user then by the code. This is more flexible than assuming that
>> no role means every role, because then every new feature that has a role
>> will end up on legacy clusters which are also not back compatible.
>>
>> There are points where I said all roles rather than back compatibility
>> roles because I was thinking about back compatibility specifically, but you
>> can't know that if I don't say that can you :).
>>
>> On Mon, Nov 1, 2021 at 12:39 PM Ishan Chattopadhyaya <
>> [email protected]> wrote:
>>
>>> > If you read more closely, my way can provide full back compatibility.
>>> To say or imply it doesn't isn't helping. Perhaps you need to re-read?
>>>
>>> I understand e-mails are frustrating, and I'm trying my best. Please
>>> don't be offended, and kindly point me to the exact part you want me to
>>> re-read.
>>>
>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan Chattopadhyaya <
>>>> [email protected]> wrote:
>>>>
>>>>> >    Positive - They denote the existence of a capability
>>>>>
>>>>> Agree, the SIP already reflects this.
>>>>>
>>>>> >   Absolute - Absence/Presence binary identification of a capability;
>>>>> no implications, no assumptions
>>>>>
>>>>> Disagree, we need backcompat handling on nodes running without any
>>>>> roles. There has to be an implicit assumption as to what roles are those
>>>>> nodes assumed to have. My proposal is that only the "data" role be 
>>>>> assumed,
>>>>> but not the "overseer" role. For any future roles ("coordinator",
>>>>> "zookeeper" etc.), this decision as to what absence of any role implies
>>>>> should be left to the implementation of that future role. Documentation
>>>>> should reflect clearly about these implicit assumptions.
>>>>>
>>>>>
>>>> If you read more closely, my way can provide full back compatibility.
>>>> To say or imply it doesn't isn't helping. Perhaps you need to re-read?
>>>>
>>>>
>>>>> >    Focused - Do one thing per role
>>>>>
>>>>> Agree. However, I disagree with ideas where "query analysis" has a
>>>>> role of its own. Where would that lead us to? Separate roles for nodes 
>>>>> that
>>>>> do "faceting" or "spell correction" etc.? But anyway, that is for
>>>>> discussion when we add future roles. This is beyond this SIP.
>>>>>
>>>>>
>>>> I am not asking you to implement every possible role of course :). As a
>>>> note I know a company that is running an entire separate cluster to offload
>>>> and better serve highlighting on a subset of large docs, so YES I think
>>>> there are people who may want such fine grained control.
>>>>
>>>>
>>>>> >    Accessible - It should be dead simple to determine the members of
>>>>> a role, avoid parsing blobs of json, avoid calculating implications, avoid
>>>>> consulting other resources after listing nodes with the role
>>>>>
>>>>> Agree. I'm open to any implementation details that make it easy. There
>>>>> should be a reasonable API to return these node roles, with ability to
>>>>> filter by role or filter by node.
>>>>>
>>>>> >    Independent - One role should not require other roles to be
>>>>> present
>>>>>
>>>>> Do we need to have this hard and fast requirement upfront? There might
>>>>> be situations where this is desirable. I feel we can discuss on a case by
>>>>> case basis whenever a future role is added.
>>>>>
>>>>> >    Persistent - roles should not be lost across reboot
>>>>>
>>>>> Agree.
>>>>>
>>>>> >    Immutable - roles should not change while the node is running
>>>>>
>>>>> Agree
>>>>>
>>>>> >    Lively - A node with a capability may not be presently providing
>>>>> that capability.
>>>>>
>>>>> I don't understand, can you please elaborate?
>>>>>
>>>>
>>>>
>>>> Specifically imagine the case where there are 100 nodes:
>>>> 1-100 ==> DATA
>>>> 101-103 ==> OVERSEER
>>>> 104-106 ==> ZOOKEEPER
>>>>
>>>> But you won't have 3 overseers... you'll want only one of those to be 
>>>> *providing
>>>> *overseer functionality and the other two to be *capable*, but not
>>>> providing (so that if the current overseer goes down a new one can be
>>>> assigned).
>>>>
>>>> Then you decide you'd ike 5 Zookeepers. You start nodes 107-108 with
>>>> that role, but you probably want to ensure that zookeepers require some
>>>> sort of command for them to actually join the zookeeper cluster (i.e.
>>>> /admin?action=ZKADD&nodes=node107,node18) ... to do that the nodes need to
>>>> be up. But oh look I typoed 108... we want that to fail... how? because 18
>>>> does not have the *capability* to become a zookeeper.
>>>>
>>>>
>>>>>
>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan Chattopadhyaya <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> > Ilan: A node not having node.roles defined should be assumed to
>>>>>> have all roles. Not only data. I don't see a reason to special case this
>>>>>> one or any role.
>>>>>> > Gus: There should be no "assumptions" Nothing to figure out. A node
>>>>>> has a role or not. For back compatibility reasons, all roles would be
>>>>>> assumed on startup if none specified.
>>>>>> > Jan: No role == all roles. Explicit list of roles = exactly those
>>>>>> roles.
>>>>>>
>>>>>> Problem with this approach is mainly to do with backcompat.
>>>>>>
>>>>>> *1. Overseer backcompat:*
>>>>>> If we don't make any modifications to how overseer works and adopt
>>>>>> this approach (as quoted), then imagine this situation:
>>>>>>
>>>>>> Solr1-100: No roles param (assumed to be "data,overseer").
>>>>>> Solr101: -Dnode.roles=overseer (intention: dedicated overseer)
>>>>>>
>>>>>> User wants this node Solr101 to be a dedicated overseer, but for that
>>>>>> to happen, he/she would need to restart all the data nodes with
>>>>>> -Dnode.roles=data. This will cause unnecessary disruption to running
>>>>>> clusters where a dedicated overseer is needed. Keep in mind, if a user
>>>>>> needs a dedicated overseer, he's likely in an emergency situation and
>>>>>> restarting the whole cluster might not be viable for him/her.
>>>>>>
>>>>>> *2. Future roles might not be compatible with this "assumed to have
>>>>>> all roles" idea:*
>>>>>> Take the proposed "zookeeper" role for example. Today, regular nodes
>>>>>> are not supposed to have embedded ZK running on them. By introducing this
>>>>>> artificial limitation ("assumed to have all roles"), we constrain 
>>>>>> adoption
>>>>>> of all future roles to necessarily require a full cluster restart.
>>>>>>
>>>>>> Keep in mind newer Solr versions can introduce new capabilities and
>>>>>> roles. Imagine we have a role that is defined in a new Solr version (and
>>>>>> there's functionality to go with that role), and user upgrades to that
>>>>>> version. However, his/her nodes all were started with no node.roles 
>>>>>> param.
>>>>>> Hence, if those nodes are "assumed to have all roles", then just by 
>>>>>> virtue
>>>>>> of upgrading to this new version, new capabilities will be turned on for
>>>>>> the entire cluster, whether or not the user opted for such a capability.
>>>>>> This is totally undesirable.
>>>>>>
>>>>>> > Gus: I actually don't want a coordinator to do more work, I would
>>>>>> prefer small focused roles with names that accurately describe their
>>>>>> function. In that light, COORDINATOR might be too nebulous. How about
>>>>>> AGREGATOR role? (what I was thinking of would better be called a
>>>>>> QUERY_ANALYSIS role)
>>>>>>
>>>>>> If you want to do specific things like query analysis or query
>>>>>> aggregation or bulk indexing etc, all of those can be done on COORDINATOR
>>>>>> nodes (as is the case in ElasticSearch). Having tens of of " small 
>>>>>> focused
>>>>>> roles" defined as first class concepts would be confusing to the user. 
>>>>>> As a
>>>>>> remedy to your situation where you want the coordinator role to also do
>>>>>> query-analysis for shards, one possible solution is to send such a query 
>>>>>> to
>>>>>> a coordinator node with a parameter like 
>>>>>> "coordinator.query_analysis=true",
>>>>>> and then the coordinator, instead of blindly hitting remote shards, also
>>>>>> does some extra work on behalf of the shards.
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan Chattopadhyaya <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> > If we make collections role-aware for example (replicas of that
>>>>>>> collection can only be
>>>>>>> > placed on nodes with a specific role, in addition to the other
>>>>>>> role based constraints),
>>>>>>> > the set of roles should be user extensible and not fixed.
>>>>>>> > If collections are not role aware, the constraints introduced by
>>>>>>> roles apply to all collections
>>>>>>> > equally which might be insufficient if a user needs for example a
>>>>>>> heavily used collection to
>>>>>>> > only be placed on more powerful nodes.
>>>>>>>
>>>>>>> I feel node roles and role-aware collections are orthogonal topics.
>>>>>>> What you describe above can be achieved by the autoscaling+replica
>>>>>>> placement framework where the placement plugins take the node roles as 
>>>>>>> one
>>>>>>> of the inputs.
>>>>>>>
>>>>>>> > It does impact the design from early on: the set of roles need to
>>>>>>> be expandable by a user
>>>>>>> > by creating a collection with new roles for example (consumed by
>>>>>>> placement plugins) and be
>>>>>>> > able to start nodes with new (arbitrary) roles. Should such roles
>>>>>>> follow some naming syntax to
>>>>>>> > differentiate them from built in roles? To be able to fail on
>>>>>>> typos on roles - that otherwise can be
>>>>>>> > crippling and hard to debug. This implies in any case that the
>>>>>>> current design can't assume all
>>>>>>> > roles are known at compile time or define them in a Java enum.
>>>>>>>
>>>>>>> I think this should be achieved by something different from roles.
>>>>>>> Something like node *labels* (user defined) which can then be used
>>>>>>> in a replica placement plugin to assign replicas. I see roles as more
>>>>>>> closely associated with kinds of functionality a node is designated for.
>>>>>>> Therefore, I feel that replica placements and user defined node labels 
>>>>>>> is
>>>>>>> out of scope for this SIP. It can be added later in a separate SIP, 
>>>>>>> without
>>>>>>> being at odds with this proposal.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg <[email protected]>:
>>>>>>>> > A node not having node.roles defined should be assumed to have
>>>>>>>> all roles. Not only data. I don't see a reason to special case this 
>>>>>>>> one or
>>>>>>>> any role.
>>>>>>>>
>>>>>>>> +1, make it simple and transparent. No role == all roles. Explicit
>>>>>>>> list of roles = exactly those roles.
>>>>>>>>
>>>>>>>> > (Gus) See my comment above, but maybe preference is something
>>>>>>>> handled as a feature of the role rather than via role designation?
>>>>>>>>
>>>>>>>> Yea, we always need an overseer, so that feature can decide to use
>>>>>>>> its list of nodes as a preference if it so chooses.
>>>>>>>>
>>>>>>>>
>>>>>>>> Aside: I think it makes it easier if we always prefix Solr env.vars
>>>>>>>> and sys.props with "SOLR_" or "solr.", i.e. -Dsolr.node.roles=foo. 
>>>>>>>> That way
>>>>>>>> we can get away from having to have explicit code in bin/solr, 
>>>>>>>> bin/solr.cmd
>>>>>>>> and SolrCLI to manage every single property. Instead we can parse all 
>>>>>>>> ENVs
>>>>>>>> and Props with the solr prefix in our bootstrap code. And we can by
>>>>>>>> convention allow e.g. docker run -e SOLR_NODE_ROLES=foo solr:9 and it 
>>>>>>>> would
>>>>>>>> be the same ting...
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

Re: First class support for node roles

Reply via email to