Re: First class support for node roles

Ishan Chattopadhyaya Mon, 01 Nov 2021 09:39:57 -0700

> If you read more closely, my way can provide full back compatibility. To
say or imply it doesn't isn't helping. Perhaps you need to re-read?


I understand e-mails are frustrating, and I'm trying my best. Please don't
be offended, and kindly point me to the exact part you want me to re-read.

On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <gus.h...@gmail.com> wrote:

>
>
> On Mon, Nov 1, 2021 at 12:22 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> >    Positive - They denote the existence of a capability
>>
>> Agree, the SIP already reflects this.
>>
>> >   Absolute - Absence/Presence binary identification of a capability; no
>> implications, no assumptions
>>
>> Disagree, we need backcompat handling on nodes running without any roles.
>> There has to be an implicit assumption as to what roles are those nodes
>> assumed to have. My proposal is that only the "data" role be assumed, but
>> not the "overseer" role. For any future roles ("coordinator", "zookeeper"
>> etc.), this decision as to what absence of any role implies should be left
>> to the implementation of that future role. Documentation should reflect
>> clearly about these implicit assumptions.
>>
>>
> If you read more closely, my way can provide full back compatibility. To
> say or imply it doesn't isn't helping. Perhaps you need to re-read?
>
>
>> >    Focused - Do one thing per role
>>
>> Agree. However, I disagree with ideas where "query analysis" has a role
>> of its own. Where would that lead us to? Separate roles for nodes that do
>> "faceting" or "spell correction" etc.? But anyway, that is for discussion
>> when we add future roles. This is beyond this SIP.
>>
>>
> I am not asking you to implement every possible role of course :). As a
> note I know a company that is running an entire separate cluster to offload
> and better serve highlighting on a subset of large docs, so YES I think
> there are people who may want such fine grained control.
>
>
>> >    Accessible - It should be dead simple to determine the members of a
>> role, avoid parsing blobs of json, avoid calculating implications, avoid
>> consulting other resources after listing nodes with the role
>>
>> Agree. I'm open to any implementation details that make it easy. There
>> should be a reasonable API to return these node roles, with ability to
>> filter by role or filter by node.
>>
>> >    Independent - One role should not require other roles to be present
>>
>> Do we need to have this hard and fast requirement upfront? There might be
>> situations where this is desirable. I feel we can discuss on a case by case
>> basis whenever a future role is added.
>>
>> >    Persistent - roles should not be lost across reboot
>>
>> Agree.
>>
>> >    Immutable - roles should not change while the node is running
>>
>> Agree
>>
>> >    Lively - A node with a capability may not be presently providing
>> that capability.
>>
>> I don't understand, can you please elaborate?
>>
>
>
> Specifically imagine the case where there are 100 nodes:
> 1-100 ==> DATA
> 101-103 ==> OVERSEER
> 104-106 ==> ZOOKEEPER
>
> But you won't have 3 overseers... you'll want only one of those to be 
> *providing
> *overseer functionality and the other two to be *capable*, but not
> providing (so that if the current overseer goes down a new one can be
> assigned).
>
> Then you decide you'd ike 5 Zookeepers. You start nodes 107-108 with that
> role, but you probably want to ensure that zookeepers require some sort of
> command for them to actually join the zookeeper cluster (i.e.
> /admin?action=ZKADD&nodes=node107,node18) ... to do that the nodes need to
> be up. But oh look I typoed 108... we want that to fail... how? because 18
> does not have the *capability* to become a zookeeper.
>
>
>>
>> On Mon, Nov 1, 2021 at 9:30 PM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> > Ilan: A node not having node.roles defined should be assumed to have
>>> all roles. Not only data. I don't see a reason to special case this one or
>>> any role.
>>> > Gus: There should be no "assumptions" Nothing to figure out. A node
>>> has a role or not. For back compatibility reasons, all roles would be
>>> assumed on startup if none specified.
>>> > Jan: No role == all roles. Explicit list of roles = exactly those
>>> roles.
>>>
>>> Problem with this approach is mainly to do with backcompat.
>>>
>>> *1. Overseer backcompat:*
>>> If we don't make any modifications to how overseer works and adopt this
>>> approach (as quoted), then imagine this situation:
>>>
>>> Solr1-100: No roles param (assumed to be "data,overseer").
>>> Solr101: -Dnode.roles=overseer (intention: dedicated overseer)
>>>
>>> User wants this node Solr101 to be a dedicated overseer, but for that to
>>> happen, he/she would need to restart all the data nodes with
>>> -Dnode.roles=data. This will cause unnecessary disruption to running
>>> clusters where a dedicated overseer is needed. Keep in mind, if a user
>>> needs a dedicated overseer, he's likely in an emergency situation and
>>> restarting the whole cluster might not be viable for him/her.
>>>
>>> *2. Future roles might not be compatible with this "assumed to have all
>>> roles" idea:*
>>> Take the proposed "zookeeper" role for example. Today, regular nodes are
>>> not supposed to have embedded ZK running on them. By introducing this
>>> artificial limitation ("assumed to have all roles"), we constrain adoption
>>> of all future roles to necessarily require a full cluster restart.
>>>
>>> Keep in mind newer Solr versions can introduce new capabilities and
>>> roles. Imagine we have a role that is defined in a new Solr version (and
>>> there's functionality to go with that role), and user upgrades to that
>>> version. However, his/her nodes all were started with no node.roles param.
>>> Hence, if those nodes are "assumed to have all roles", then just by virtue
>>> of upgrading to this new version, new capabilities will be turned on for
>>> the entire cluster, whether or not the user opted for such a capability.
>>> This is totally undesirable.
>>>
>>> > Gus: I actually don't want a coordinator to do more work, I would
>>> prefer small focused roles with names that accurately describe their
>>> function. In that light, COORDINATOR might be too nebulous. How about
>>> AGREGATOR role? (what I was thinking of would better be called a
>>> QUERY_ANALYSIS role)
>>>
>>> If you want to do specific things like query analysis or query
>>> aggregation or bulk indexing etc, all of those can be done on COORDINATOR
>>> nodes (as is the case in ElasticSearch). Having tens of of " small focused
>>> roles" defined as first class concepts would be confusing to the user. As a
>>> remedy to your situation where you want the coordinator role to also do
>>> query-analysis for shards, one possible solution is to send such a query to
>>> a coordinator node with a parameter like "coordinator.query_analysis=true",
>>> and then the coordinator, instead of blindly hitting remote shards, also
>>> does some extra work on behalf of the shards.
>>>
>>>
>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
>>>> > If we make collections role-aware for example (replicas of that
>>>> collection can only be
>>>> > placed on nodes with a specific role, in addition to the other role
>>>> based constraints),
>>>> > the set of roles should be user extensible and not fixed.
>>>> > If collections are not role aware, the constraints introduced by
>>>> roles apply to all collections
>>>> > equally which might be insufficient if a user needs for example a
>>>> heavily used collection to
>>>> > only be placed on more powerful nodes.
>>>>
>>>> I feel node roles and role-aware collections are orthogonal topics.
>>>> What you describe above can be achieved by the autoscaling+replica
>>>> placement framework where the placement plugins take the node roles as one
>>>> of the inputs.
>>>>
>>>> > It does impact the design from early on: the set of roles need to be
>>>> expandable by a user
>>>> > by creating a collection with new roles for example (consumed by
>>>> placement plugins) and be
>>>> > able to start nodes with new (arbitrary) roles. Should such roles
>>>> follow some naming syntax to
>>>> > differentiate them from built in roles? To be able to fail on typos
>>>> on roles - that otherwise can be
>>>> > crippling and hard to debug. This implies in any case that the
>>>> current design can't assume all
>>>> > roles are known at compile time or define them in a Java enum.
>>>>
>>>> I think this should be achieved by something different from roles.
>>>> Something like node *labels* (user defined) which can then be used in
>>>> a replica placement plugin to assign replicas. I see roles as more closely
>>>> associated with kinds of functionality a node is designated for. Therefore,
>>>> I feel that replica placements and user defined node labels is out of scope
>>>> for this SIP. It can be added later in a separate SIP, without being at
>>>> odds with this proposal.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl <jan....@cominvent.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg <ilans...@gmail.com>:
>>>>> > A node not having node.roles defined should be assumed to have all
>>>>> roles. Not only data. I don't see a reason to special case this one or any
>>>>> role.
>>>>>
>>>>> +1, make it simple and transparent. No role == all roles. Explicit
>>>>> list of roles = exactly those roles.
>>>>>
>>>>> > (Gus) See my comment above, but maybe preference is something
>>>>> handled as a feature of the role rather than via role designation?
>>>>>
>>>>> Yea, we always need an overseer, so that feature can decide to use its
>>>>> list of nodes as a preference if it so chooses.
>>>>>
>>>>>
>>>>> Aside: I think it makes it easier if we always prefix Solr env.vars
>>>>> and sys.props with "SOLR_" or "solr.", i.e. -Dsolr.node.roles=foo. That 
>>>>> way
>>>>> we can get away from having to have explicit code in bin/solr, 
>>>>> bin/solr.cmd
>>>>> and SolrCLI to manage every single property. Instead we can parse all ENVs
>>>>> and Props with the solr prefix in our bootstrap code. And we can by
>>>>> convention allow e.g. docker run -e SOLR_NODE_ROLES=foo solr:9 and it 
>>>>> would
>>>>> be the same ting...
>>>>>
>>>>> Jan
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
>>>>> For additional commands, e-mail: dev-h...@solr.apache.org
>>>>>
>>>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: First class support for node roles

Reply via email to