Re: First class support for node roles

Ishan Chattopadhyaya Mon, 01 Nov 2021 09:48:43 -0700

> But I assume that a new feature in 9.x that introduces a new role can
also decide for some alternative back-compat logic to support rolling
restart if it is needed.


IMHO, having per feature enable/disable flag would be ugly user experience.

Imagine, telling users that for the newly introduced "zookeeper" role, you
need to start nodes with:

-Dnodes.role=zookeeper and -Dembedded.zk=true

instead of

-Dnodes.role=zookeeper
(itself enables the functionality needed for that role).

On Mon, Nov 1, 2021 at 9:49 PM Jan Høydahl <[email protected]> wrote:

> I think it is safe to assume that small clusters, say 1-5 nodes will most
> often want to have all features on all nodes as the cluster is too small to
> specialize, and then the default is perfect.
> For large clusters we should recommend explicitly specifying roles during
> the 9.0 upgrade. So if you have 100 nodes, you would likely have assigned
> the overseer role to a handful nodes when upgrading to 9.0.
> And for every new feature in 9.x you will explicitly decide whether to use
> it and what nodes should have the role.
>
> But I assume that a new feature in 9.x that introduces a new role can also
> decide for some alternative back-compat logic to support rolling restart if
> it is needed.
>
> Jan
>
> 1. nov. 2021 kl. 17:00 skrev Ishan Chattopadhyaya <
> [email protected]>:
>
> > Ilan: A node not having node.roles defined should be assumed to have all
> roles. Not only data. I don't see a reason to special case this one or any
> role.
> > Gus: There should be no "assumptions" Nothing to figure out. A node has
> a role or not. For back compatibility reasons, all roles would be assumed
> on startup if none specified.
> > Jan: No role == all roles. Explicit list of roles = exactly those roles.
>
> Problem with this approach is mainly to do with backcompat.
>
> *1. Overseer backcompat:*
> If we don't make any modifications to how overseer works and adopt this
> approach (as quoted), then imagine this situation:
>
> Solr1-100: No roles param (assumed to be "data,overseer").
> Solr101: -Dnode.roles=overseer (intention: dedicated overseer)
>
> User wants this node Solr101 to be a dedicated overseer, but for that to
> happen, he/she would need to restart all the data nodes with
> -Dnode.roles=data. This will cause unnecessary disruption to running
> clusters where a dedicated overseer is needed. Keep in mind, if a user
> needs a dedicated overseer, he's likely in an emergency situation and
> restarting the whole cluster might not be viable for him/her.
>
> *2. Future roles might not be compatible with this "assumed to have all
> roles" idea:*
> Take the proposed "zookeeper" role for example. Today, regular nodes are
> not supposed to have embedded ZK running on them. By introducing this
> artificial limitation ("assumed to have all roles"), we constrain adoption
> of all future roles to necessarily require a full cluster restart.
>
> Keep in mind newer Solr versions can introduce new capabilities and roles.
> Imagine we have a role that is defined in a new Solr version (and there's
> functionality to go with that role), and user upgrades to that version.
> However, his/her nodes all were started with no node.roles param. Hence, if
> those nodes are "assumed to have all roles", then just by virtue of
> upgrading to this new version, new capabilities will be turned on for the
> entire cluster, whether or not the user opted for such a capability. This
> is totally undesirable.
>
> > Gus: I actually don't want a coordinator to do more work, I would prefer
> small focused roles with names that accurately describe their function. In
> that light, COORDINATOR might be too nebulous. How about AGREGATOR role?
> (what I was thinking of would better be called a QUERY_ANALYSIS role)
>
> If you want to do specific things like query analysis or query aggregation
> or bulk indexing etc, all of those can be done on COORDINATOR nodes (as is
> the case in ElasticSearch). Having tens of of " small focused roles"
> defined as first class concepts would be confusing to the user. As a remedy
> to your situation where you want the coordinator role to also do
> query-analysis for shards, one possible solution is to send such a query to
> a coordinator node with a parameter like "coordinator.query_analysis=true",
> and then the coordinator, instead of blindly hitting remote shards, also
> does some extra work on behalf of the shards.
>
>
> On Mon, Nov 1, 2021 at 9:01 PM Ishan Chattopadhyaya <
> [email protected]> wrote:
>
>> > If we make collections role-aware for example (replicas of that
>> collection can only be
>> > placed on nodes with a specific role, in addition to the other role
>> based constraints),
>> > the set of roles should be user extensible and not fixed.
>> > If collections are not role aware, the constraints introduced by roles
>> apply to all collections
>> > equally which might be insufficient if a user needs for example a
>> heavily used collection to
>> > only be placed on more powerful nodes.
>>
>> I feel node roles and role-aware collections are orthogonal topics. What
>> you describe above can be achieved by the autoscaling+replica placement
>> framework where the placement plugins take the node roles as one of the
>> inputs.
>>
>> > It does impact the design from early on: the set of roles need to be
>> expandable by a user
>> > by creating a collection with new roles for example (consumed by
>> placement plugins) and be
>> > able to start nodes with new (arbitrary) roles. Should such roles
>> follow some naming syntax to
>> > differentiate them from built in roles? To be able to fail on typos on
>> roles - that otherwise can be
>> > crippling and hard to debug. This implies in any case that the current
>> design can't assume all
>> > roles are known at compile time or define them in a Java enum.
>>
>> I think this should be achieved by something different from roles.
>> Something like node *labels* (user defined) which can then be used in a
>> replica placement plugin to assign replicas. I see roles as more closely
>> associated with kinds of functionality a node is designated for. Therefore,
>> I feel that replica placements and user defined node labels is out of scope
>> for this SIP. It can be added later in a separate SIP, without being at
>> odds with this proposal.
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl <[email protected]> wrote:
>>
>>>
>>>
>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg <[email protected]>:
>>> > A node not having node.roles defined should be assumed to have all
>>> roles. Not only data. I don't see a reason to special case this one or any
>>> role.
>>>
>>> +1, make it simple and transparent. No role == all roles. Explicit list
>>> of roles = exactly those roles.
>>>
>>> > (Gus) See my comment above, but maybe preference is something handled
>>> as a feature of the role rather than via role designation?
>>>
>>> Yea, we always need an overseer, so that feature can decide to use its
>>> list of nodes as a preference if it so chooses.
>>>
>>>
>>> Aside: I think it makes it easier if we always prefix Solr env.vars and
>>> sys.props with "SOLR_" or "solr.", i.e. -Dsolr.node.roles=foo. That way we
>>> can get away from having to have explicit code in bin/solr, bin/solr.cmd
>>> and SolrCLI to manage every single property. Instead we can parse all ENVs
>>> and Props with the solr prefix in our bootstrap code. And we can by
>>> convention allow e.g. docker run -e SOLR_NODE_ROLES=foo solr:9 and it would
>>> be the same ting...
>>>
>>> Jan
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>

Re: First class support for node roles

Reply via email to