I recommend the following format for the role spec

roles=<role-name>:<role-value>

each role will have an enum of allowed values and a default value


   - role name: *data*
      - values: [*on*, *off]*
      - default: *allowed*
   - role name: *overseer*
      - values: [*allowed*, *disallowed*, *preferred]*
      - default : *allowed*
   - role name:* coordinator*
      - values : [*on*, *off]*
      - default: *off*


examples
roles=data:on,overseer:allowed (This is redundant because it uses all the
default values. If a node is started without any roles value this is the
default behavior)
roles=data:off,overseer:preferred ( do not allow data, join overseer
election at head)
roles=coordinator:on,data:on (role as coordinator, but allow data, it's
same as roles=coordinator:on)
roles=coordinator:on,data:off (role as coordinator, disallow data)


On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <ilans...@gmail.com> wrote:

> If we go with no negative node roles and overseer node role is not strict
> (i.e. it’s a "preferred overseer"), then one would need to define a second
> node role "no_overseer" to explicitly exclude a node from ever becoming
> overseer (which I think is a useful feature until we switch the cluster
> default to not using the overseer), plus the implementation of these two
> node roles will obviously be coupled (and what if a node has both defined?).
>
> I prefer strict node roles.
> Maybe we could have node roles with [optional] parameters to let the node
> role implementation decide ?
> The overseer node role for example could have one of 3 values defined for
> each node: “preferred” (default, equivalent to the existing overseer role),
> "accepted" (equivalent to currently not defining the overseer role) and
> "no_way" (does not exist today).
>
> This could be useful in other contexts. A node role “data” could be “fast”
> or “slow” depending on type of local persistent storage for example…
>
> Ilan
>
> On Fri 3 Dec 2021 at 16:10, Gus Heck <gus.h...@gmail.com> wrote:
>
>> I really don't think we should have types of roles. Not negative/positive
>> and not strict/non-strict. You have a role or you don't. What that means is
>> up to the code implementing the role.
>>
>> Roles should be free to configure a preference order (binary, or n-ary or
>> whatever, strict or loose), prohibit behavior, or enable behavior. In this
>> SIP I feel we should focus on How to identify what node has what role, How
>> to designate what roles a node has via config/params, and the API's for
>> interacting with roles.
>>
>> We should for example be able to support roles such as
>>
>> PREFERRED_OVERSEER
>> DATA
>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
>>
>> Details about role implementation should probably be discussed in a
>> thread about that role.  Obviously we should think about the name carefully
>> to leave options open should we want to enhance things later so maybe
>>
>> OVERSEER_PREF  or just  OVERSEER
>>
>> would be better since it merely reades that the node implements some sort
>> of preference or config regarding overseer... but all this can be decided
>> on a per role basis
>>
>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <noble.p...@gmail.com> wrote:
>>
>>> Negative roles have a place
>>>
>>> Example is overseer
>>>
>>> There are 3 possible choices for that role
>>>
>>> a) preferred: always be in front of the election queue
>>> b) on: not preferred, but can be an overseer if no preferred overseer
>>> nodes are available
>>> c) off: never become an overseer
>>>
>>> Today we only have options 'a' and 'b' . In a future ticket, we may
>>> implement C
>>>
>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote:
>>>
>>>> Negative roles add a lot of complexity, I would really want to stay
>>>> away from them. That’s why I want strict roles up front. It’s maybe ok to
>>>> push this decision out, but it also seems like the sort of thing we should
>>>> consider at the start.
>>>>
>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <noble.p...@gmail.com> wrote:
>>>>
>>>>> Yes. Negative roles is not a bad idea. If I start a node for
>>>>> machine learning purposes, I wouldn't want that node to ever participate 
>>>>> in
>>>>> overseer election
>>>>>
>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <ilans...@gmail.com> wrote:
>>>>>
>>>>>> If we have non strict roles (like overseer), then it does make sense
>>>>>> to have negative roles.
>>>>>> That way I can define which are the two nodes that I'd prefer the
>>>>>> overseer to run on, and a few other nodes on which it should
>>>>>> definitely never run for various reasons. And in case these
>>>>>> "!overseer" are the only nodes left in the cluster, let the cluster
>>>>>> fail the same way it would if there were no data nodes available.
>>>>>>
>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman <
>>>>>> houstonput...@gmail.com> wrote:
>>>>>> >>>
>>>>>> >>> With the Strict/Loose option and sensible defaults, users cannot
>>>>>> trip themselves up by default, but the option is there for people to 
>>>>>> tinker
>>>>>> and have an iron grip over their cluster.
>>>>>> >>
>>>>>> >>
>>>>>> >> +1 to sensible defaults so users don't trip themselves. The option
>>>>>> to tinker for tighter grip can be tackled later, either on a per role 
>>>>>> basis
>>>>>> or as a generic concept later.
>>>>>> >
>>>>>> >
>>>>>> > +1 - Can definitely be added later if we so desire, not needed for
>>>>>> this SIP
>>>>>> >
>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya <
>>>>>> ichattopadhy...@gmail.com> wrote:
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gus.h...@gmail.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> I think the key  is to let the roles have full control of the
>>>>>> implications of having/not having that role. No need for even a
>>>>>> strict/loose designation. The question of do you have the role is yes/no
>>>>>> with no logic to guess if the role is implied or not, The question of 
>>>>>> will
>>>>>> it come up with the role is "have_explicit ? use_defaults : use_defaults.
>>>>>> >>>
>>>>>> >>> Once you figure out who has a role (or not) what that means is up
>>>>>> to the role code.
>>>>>> >>>
>>>>>> >>> Corollary: we don't have to change the way overseer works in this
>>>>>> SIP. We can rework it or not as we see fit separately.
>>>>>> >>
>>>>>> >>
>>>>>> >> +1
>>>>>> >>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Only thing we need to do is find a wording that makes the above
>>>>>> clear on first read through the SIP :)
>>>>>> >>>
>>>>>> >>> -Gus
>>>>>> >>>
>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman <
>>>>>> houstonput...@gmail.com> wrote:
>>>>>> >>>>>
>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>> all of our existing OVERSEER candidates are down. When at least one of 
>>>>>> them
>>>>>> is up, the overseer will go there, and that is good and expected. But 
>>>>>> what
>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>> unrelated, untagged node. I disagree with this implementation choice. 
>>>>>> This
>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> I'm very strongly in favor of not letting users design a system
>>>>>> in which the cluster can be "live" without an overseer. I understand that
>>>>>> the overseer can be taxing to the cluster, but honestly what is the point
>>>>>> of having an untaxed cluster that doesn't have an overseer? I can see
>>>>>> arguments for the other roles to be stricter about this, but there are 
>>>>>> also
>>>>>> a lot of users who wouldn't want those to be strict either (like "query"
>>>>>> nodes).
>>>>>> >>>>
>>>>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer
>>>>>> role node HAS to be selected to become overseer, it will try to migrate 
>>>>>> the
>>>>>> overseer job to a node with the overseer role whenever one becomes live.
>>>>>> >>>>
>>>>>> >>>> So maybe we don't have special rules per role, but instead roles
>>>>>> can either be defined as "Strict" or "Loose" (better names likely exist),
>>>>>> and the roles come with a default (Overseer -> Loose, Data -> Strict, 
>>>>>> Query
>>>>>> -> Loose, etc.). And it is up to each role to define how to behave when
>>>>>> running in LOOSE mode and a non-role node is used then a role node comes
>>>>>> online (like the overseer example given above).
>>>>>> >>>>
>>>>>> >>>> With the Strict/Loose option and sensible defaults, users cannot
>>>>>> trip themselves up by default, but the option is there for people to 
>>>>>> tinker
>>>>>> and have an iron grip over their cluster.
>>>>>> >>>>
>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com>
>>>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Noble wrote:
>>>>>> >>>>> > We are not modifying the way the "overseer role" works today.
>>>>>> We are just changing the definition and standardizing the configuration &
>>>>>> discoverability
>>>>>> >>>>> Ishan wrote:
>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER
>>>>>> role (which currently stands for preferred overseer). We can take a stab 
>>>>>> at
>>>>>> refactoring it later.
>>>>>> >>>>>
>>>>>> >>>>> Grouping these two comments together, since I think they are
>>>>>> saying the same thing. I think this is part of my confusion. We have an 
>>>>>> old
>>>>>> system that doesn't work the way we want the new system to work. There 
>>>>>> may
>>>>>> be people already using the old system. What path do we offer for folks
>>>>>> using the old system to migrate to the new system? What happens if 
>>>>>> somebody
>>>>>> accidentally tries to use both systems at the same time?
>>>>>> >>>>>
>>>>>> >>>>> Ishan wrote:
>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER
>>>>>> role] are live, Solr guarantees that one of those nodes becomes the
>>>>>> overseer.", I meant to somewhat capture the current behaviour as the
>>>>>> OVERSEER role performs today. Do you see any inconsistency with this
>>>>>> statement vs. what it does today?
>>>>>> >>>>>
>>>>>> >>>>> This doesn't really address my concern around what happens if
>>>>>> all of our existing OVERSEER candidates are down. When at least one of 
>>>>>> them
>>>>>> is up, the overseer will go there, and that is good and expected. But 
>>>>>> what
>>>>>> happens if all of the overseer eligible nodes are down. Your comment, and
>>>>>> the old system, would imply that the overseer election goes to some other
>>>>>> unrelated, untagged node. I disagree with this implementation choice. 
>>>>>> This
>>>>>> sounds like something role specific to determine, but I would like to see
>>>>>> us be more strict about it. I don't want cores leaking out of my data
>>>>>> roles, I don't want query processing to leak out of my "query" nodes or
>>>>>> whatever. Overseer shouldn't be special in this regard.
>>>>>> >>>>>
>>>>>> >>>>> Noble wrote:
>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in
>>>>>> the following request?
>>>>>> >>>>>
>>>>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's
>>>>>> leave it as is.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya <
>>>>>> ichattopadhy...@gmail.com> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com>
>>>>>> wrote:
>>>>>> >>>>>>>
>>>>>> >>>>>>> Replying to the top post in this thread because there has
>>>>>> been a lot of discussion and I don't want to look like I'm continuing any
>>>>>> of those particular threads.
>>>>>> >>>>>>>
>>>>>> >>>>>>> I finally had time to sit down and think about this with the
>>>>>> attention it deserves and am generally happy with how the conversation 
>>>>>> has
>>>>>> shaped the current proposal.
>>>>>> >>>>>>>
>>>>>> >>>>>>> GOOD: I think using system properties to define node roles is
>>>>>> fine and I like that data is the default role when not defined. I think 
>>>>>> it
>>>>>> is important to hold on to the guarantee that an active overseer will 
>>>>>> land
>>>>>> on an overseer node role.
>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for
>>>>>> folks using the current OVERSEER role. I am not sure that something can 
>>>>>> be
>>>>>> done automatically since they need to now specify new properties at
>>>>>> startup. Maybe we need to include loud warnings or support both 
>>>>>> approaches
>>>>>> for a time?
>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer
>>>>>> nodes fail, then it is implied the overseer will go to one of the data
>>>>>> nodes. The specific wording in the SIP - "When one or more such nodes are
>>>>>> live, Solr guarantees that one of those nodes become the overseer." 
>>>>>> implies
>>>>>> to me that failover could go from overseer1 to overseer2 to overseerN to
>>>>>> random node. I feel like we need to have some recording that there were
>>>>>> dedicated overseer nodes and stop the cascading failure instead of 
>>>>>> churning
>>>>>> through our data nodes.
>>>>>> >>>>>>>
>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope
>>>>>> of "coordinator" roles from a split query/indexing standpoint. I 
>>>>>> understand
>>>>>> that these are used as examples, but would like stronger language that 
>>>>>> new
>>>>>> roles should also go through their own SIP discussions.
>>>>>> >>>>>>>
>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node
>>>>>> liveness in two different places now. We have the live nodes and we have
>>>>>> the node roles stored in two different places in zookeeper and it feels
>>>>>> like this would lead to race conditions or split brain or other hard to
>>>>>> diagnose bugs when those two lists don't agree with each other. This also
>>>>>> feels like it contradicts the "single source of truth" idea later stated 
>>>>>> in
>>>>>> the proposal. I see Gus's arguments for decoupling these and am not
>>>>>> strongly opposed, I just get a lurking feeling about it. Even if we don't
>>>>>> do this, I would like this called out explicitly in the alternative
>>>>>> approaches section as something that we considered and rejected, with
>>>>>> details why,
>>>>>> >>>>>>>
>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional
>>>>>> call out here that all operations are GET because nodes cannot be changed
>>>>>> at runtime.
>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous
>>>>>> OVERSEER preference role?
>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of
>>>>>> available roles for a cluster. I _think_ this could be based on the 
>>>>>> version
>>>>>> that the cluster is running? Would be useful to be able to interrogate a
>>>>>> cluster in the future... we're seeing OOM issues on queries, can we add
>>>>>> some query nodes? When were they introduced? I don't know what path this
>>>>>> API should exist at.
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP
>>>>>> document. Not sure if there's a better path that we could go for.
>>>>>> >>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which
>>>>>> parts are string literals and which parts are meant to be substituted by
>>>>>> the operator? GET /api/cluster/roles/data would become GET
>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation.
>>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1
>>>>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate
>>>>>> "nodes"
>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that
>>>>>> intermediate "nodes" node.
>>>>>> >>>>>>>
>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some permissions?
>>>>>> Maybe this requirement is too fundamental to the operation of a cluster 
>>>>>> and
>>>>>> everybody would have to be able to do it.
>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to
>>>>>> treat roles? Implementation detail that the servers will figure out? Or
>>>>>> strict guidance where the client needs to check where specific roles are
>>>>>> before sending any further communication to the server?
>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that
>>>>>> it can't fulfil? An overseer node gets a query or an update. A data node
>>>>>> gets a collection creation request. Do they forward it on to an 
>>>>>> appropriate
>>>>>> node, or do they reject it? Should this be configurable? If not, then it
>>>>>> seems like lazy or poorly configured clients will defeat this isolation
>>>>>> system quite easily.
>>>>>> >>>>>>>
>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when
>>>>>> roles are added mean? I thought we established that they are not dynamic.
>>>>>> >>>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> Thanks,
>>>>>> >>>>>>> Mike
>>>>>> >>>>>>>
>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya <
>>>>>> ichattopadhy...@gmail.com> wrote:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Hi,
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>> >>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> We also wish to add first class support for Query nodes that
>>>>>> are used to process user queries by forwarding to data nodes,
>>>>>> merging/aggregating them and presenting to users. This concept exists as
>>>>>> first class citizens in most other search engines. This is a chance for
>>>>>> Solr to catch up.
>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Regards,
>>>>>> >>>>>>>> Ishan / Noble / Hitesh
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> http://www.needhamsoftware.com (work)
>>>>>> >>> http://www.the111shift.com (play)
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
>>>>>> For additional commands, e-mail: dev-h...@solr.apache.org
>>>>>>
>>>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
-----------------------------------------------------
Noble Paul

Reply via email to