Re: First class support for node roles

Jan Høydahl Mon, 06 Dec 2021 01:01:02 -0800

Are we making a non-issue into a configuration mess?

The overseer's job is diminishing by every version, and we should not fool 
ourself into believing that a stray overseer will kill an upgrade, and 
therefore complicate the whole role system. Also, we should not put so much 
emhasis on "nodes without roles defined" as if that should be a common way of 
starting nodes in a huge cluster. In huge clusters users should be explicit 
about roles on every single node.


So my proposal stands:
- Roles are binary (optional role config can be added, but not part of the role)
- Nodes started without explicit roles get ALL roles, interpreted as ALLOW
- Nodes started with explict roles get exactly those roles, interpreted as ALLOW

I.e. any future role will be ALLOWED on nodes started without explicit roles. 
Which means any furure "ui" or "zk" role or whatever will be ALLOWED to run on 
those nodes if that feature is enabled. We have to distinguish between a role 
which ALLOWS a feature to run and the feature itself, which is enabled e.g. by 
creating a collection (selects nodes from data role), configuring zookeeper 
(selects nodes from zk role) etc.

In large clusters where you specify roles explicitly, you will need to carve 
out what nodes should run new roles (such as UI), and start/restart those nodes 
with that role before enabling the new feature.

Jan

> 6. des. 2021 kl. 09:12 skrev Ilan Ginzburg <[email protected]>:
> 
> Noble got my intention correctly.
> 
> I think role specific code should only have to deal with the various 
> configuration options for the role. When configuration was binary (role 
> defined or not), then the default is one of the two values, but even then we 
> saw that for data we wanted default true and for Overseer default false.
> 
> If we introduce non boolean roles (i.e. role parameters), absence of a role 
> has to be mapped to one of the values (otherwise it acts as yet another value 
> - confusing) . Role config in absence of explicit role definition for a node 
> has to be defined for each role (data by default on, Overseer by default 
> allowed... ) in some way.
> Using a string separate from the role definitions (Ishan) makes it too easy 
> to have roles for which the default configuration is unknown.
> 
> Ilan
> 
> 
> 
> Le lun. 6 déc. 2021 à 08:58, Ishan Chattopadhyaya <[email protected] 
> <mailto:[email protected]>> a écrit :
> Role specific configurations can go into /node_roles/${rolename} znode, and 
> that is outside the scope of this SIP. The concept of role specific modes (eg 
> allowed, preferred for overseer) is a welcome addition to original proposal 
> to model the overseer functionality properly without any confusion to user. 
> On top of that, default roles for nodes that don't have any roles defined for 
> it can be assumed by default (data:on, overseer:allowed). 
> 
> Isn't that simple and generic at the same time? Why overcomplicate everything 
> all over again?
> 
> On Mon, 6 Dec, 2021, 12:54 pm Gus Heck, <[email protected] 
> <mailto:[email protected]>> wrote:
> So I think we're loosing sight of the original concept of "default" and 
> conflating it with role configuration.
> 
> When we started talking about "default roles" the idea was "default" was a 
> flag that indicated if the role was active on a Solr Node where no roles had 
> been specified. Plain and simple. Full stop.
> 
> Secondarily any given role might or might not have some configuration 
> associated with it. Optionally a role that accepts configuration may define 
> default configuration values but this has nothing to do with "default role" 
> 
> Default should be an intrinsic binary property of the role as a whole (not 
> specific to a cluster or a node). 
> 
> There are 3 levels to think about 
> Intrinsic Attributes of the role as a whole (Example  --> default: yes/no)
> Configurable attributes for the role across the cluster (Example --> Strict: 
> yes/no) (concept mentioned previously affecting how presence of a role is 
> interpreted by role related code)
> Configurable attributes for the node that relate to the role (Example --> 
> Election_priority_adjust: integer ) (Hypothetical way of influencing who gets 
> elected first in a more fine grained fashion)
> Maybe use the following terminology?
> Role Intrinsic Property
> Role Cluster Config
> Role Node Config
> We almost certainly have to determine what Role Intrinsic Properties we want 
> to support as these are likely to be coded into the role implementation 
> directly, and implementors of roles should specify these. (I'm not presently 
> seeing need for more than "default". 
> 
> The config levels I think we want to mostly identify where that information 
> can be communicated and stored. The Role Cluster Config level is tricky since 
> there's no "cluster" until you start the first "Node" ... so a bit of a 
> chicken/egg there. The Role Node Config  however seems to make sense as a 
> file that gets read and then reflected in zk as appropriate during node 
> startup (config that specified the local directory for something would not 
> need to show up in zk of course, just stuff that another node/overseer/query 
> router/whatever might need to know.
> 
> Definitely let's reword anything that involves the phrase "Two Defaults" 
> since by definition only one value can be the "default" value (I suppose 
> theoretically you could have a mapping of defaults conditional on some other 
> value but that's definitely the opposite of simple). 
> 
> -Gus
> 
> On Mon, Dec 6, 2021 at 12:36 AM Ishan Chattopadhyaya 
> <[email protected] <mailto:[email protected]>> wrote:
> I think I understand Ilan's motivation for two defaults. Here's a summary of 
> what I understand Ilan's proposal, and a follow up proposal that achieves the 
> similar effect with less perceived complexity to user.
> 
> Ilan's proposal (as I understand it):
> 
> 1. Every role to have two defaults. Example:
> data: {modes: [on, off], default1: on, default2: on}
> overseer: {modes: [allowed, disallowed, preferred], default1: preferred, 
> default2: disallowed}
> ui: {modes: [on, off], default1: on, default2: on}
> 
> 2. Here, default1 is for lazy users for whom "-Dnode.roles=<rolename>" will 
> be interpreted as "-Dnode.roles=<rolename>:<default1 of rolename>".
> 3. Here's default2 for any role role1 is for users who either (a) never 
> specified any roles for a node, or (b) specified other roles, but not role1. 
> In both cases, the behaviour of that node would implicitly assume 
> "role1:<default2 of role1>".
> 
> My alternate proposal:
> 1. There are no role specific defaults. Example:
> data: {modes: [on, off]}
> overseer: {modes: [allowed, disallowed, preferred]}
> ui: {modes: [on, off]}
> 
> 2. There is a node specific default roles string if no -Dnode.roles was 
> specified. Example:
> "data:on, overseer:allowed" (Today's system)
> "data:on, overseer:allowed, ui:on" (When a future role, say "ui" is 
> introduced)
> 
> 3. If a node was started with explicitly specified roles, that node will have 
> exactly those roles (in the specified modes) and nothing else (no assumptions 
> about other non-specified roles, i.e. those roles not specified will not run).
> 
> Benefits of my proposal:
> 1. Easier to understand for users.
> 2. Here's a scenario where user will be happier in my proposal vs. Ilan's 
> proposal:
>    * 10 nodes with -Droles=data:on,overseer:allowed. (Regular data nodes)
>    * 2 nodes with -Droles=overseer:preferred. (Two dedicated overseer nodes)
>    * User upgrades from Solr 9.0 to 9.1, where "ui" role has been introduced. 
> Developers of "ui" role want it to be available for most users.
>         - In Ilan's proposal, the developer chooses this in 9.1: ui: {modes: 
> [on, off], default1: on, default2: on}. Now, user upgrading will see that UI 
> is running on his two overseer nodes, and he's confused (because he 
> explicitly specified what he wants)
>         - In my proposal, the developer chooses ui: {modes: [on, off]}; 
> default roles for those users who don't specify roles: "data:on, 
> overseer:allowed, ui:on". Now, there are no surprises of implicit default. 
> Users who don't use roles at all will get this functionality turned on, just 
> as the developer wanted. Users who use roles will have to explicitly append 
> "ui:on" to their roles string on their nodes during the upgrade (this tip 
> will come from the upgrade notes).
> 
> What do you think, Ilan/Noble/Mike/Gus/Houston?
> 
> On Mon, Dec 6, 2021 at 8:10 AM Noble Paul <[email protected] 
> <mailto:[email protected]>> wrote:
> Ilan was asking how what should be the overseer role in the following 
> situations 
> 
> a) role=overseer,data:on
> b) role=overseer: preferred,data:on
> c) role=data:on
> 
> I'm saying a shouldn't be valid. Only b & c are valid
> 
> 
> 
> 
> 
> 
> On Mon, Dec 6, 2021, 12:44 PM Mike Drob <[email protected] 
> <mailto:[email protected]>> wrote:
> Ilan,
> 
> Can you provide a more detailed concrete example? I’m having a lot of trouble 
> understanding what you are proposing, beyond that it is somehow 
> contraindicated with what Ishan/Noble suggest.
> 
> Apologies for my failure to understand.
> 
> Thanks,
> Mike
> 
> On Sun, Dec 5, 2021 at 5:21 PM Ilan Ginzburg <[email protected] 
> <mailto:[email protected]>> wrote:
> If we go with optional role params, we need two defaults:
> 1. the param value to use when the role is specified without a parameter, and
> 2. the param value to use for the role on a node for which the role is
> not specified at all.
> 
> I don't know how to sensibly name these defaults, but the actual
> values would be:
> overseer: default1=preferred, default2=allowed
> data: default1=on, default2=on
> coordinator: default1=on, default2=off
> 
> If we do not allow specifying a role without a parameter, then
> default1 does not exist and the example Noble posted earlier covers
> us. But simple roles will be easier to use without parameters (and the
> transition from existing overseer role would be trivial).
> 
> On Sun, Dec 5, 2021 at 7:17 AM Ishan Chattopadhyaya
> <[email protected] <mailto:[email protected]>> wrote:
> >
> > I'm +1 on this. It "looks" complicated at first, but simplifies all 
> > headaches going forward.
> >
> > On Sun, Dec 5, 2021 at 11:46 AM Noble Paul <[email protected] 
> > <mailto:[email protected]>> wrote:
> >>
> >> I shall update the SIP proposal if we have a consensus on this 
> >> configuration
> >>
> >> On Sun, Dec 5, 2021 at 4:58 PM Noble Paul <[email protected] 
> >> <mailto:[email protected]>> wrote:
> >>>
> >>>
> >>>
> >>> On Sun, Dec 5, 2021 at 4:47 PM Gus Heck <[email protected] 
> >>> <mailto:[email protected]>> wrote:
> >>>>
> >>>> I like this in that it's an example of how the overseer might be 
> >>>> extended without creating a new role :)
> >>>>
> >>>> Not entirely sure if I'm for or against an enum implementation here, but 
> >>>> it makes me a bit nervous. Enums with complexity can quickly get into 
> >>>> difficulty for unit tests (especially if one wanted to write a mock 
> >>>> object based test, something I think we maybe should use a bit more than 
> >>>> we do).
> >>>>
> >>>>
> >>>>
> >>>> I would tend to think of a class to represent and collect role related 
> >>>> functionality, one that perhaps has methods that receive the request, or 
> >>>> other key objects and thus could be tested without standing up an entire 
> >>>> server. (Not against also having them exercised in a few integrated 
> >>>> tests, but the more we can avoid interleaving logic directly within 
> >>>> DispatchFilter and HttpSolrCall etc. the better.
> >>>>
> >>>>
> >>>> So I guess I'm somewhat biased against any enum with more than a couple 
> >>>> properties, and definitely don't want to wind up hanging lots of methods 
> >>>> off of one. Better to use them to consume a configuration value and then 
> >>>> instantiate a class that really holds the logic and data. I like them 
> >>>> for constraining values and easy string value conversion but the more 
> >>>> they look like classes the more I'd rather have a class.
> >>>
> >>>
> >>>  I just meant it is a set of values. Please let us not discuss the actual 
> >>> impl here . We should stick to discussing the high level design here and 
> >>> specifics should be dealt with in a PR
> >>>>
> >>>>
> >>>> -Gus
> >>>>
> >>>> On Sat, Dec 4, 2021 at 10:37 PM Noble Paul <[email protected] 
> >>>> <mailto:[email protected]>> wrote:
> >>>>>
> >>>>> I recommend the following format for the role spec
> >>>>>
> >>>>> roles=<role-name>:<role-value>
> >>>>>
> >>>>> each role will have an enum of allowed values and a default value
> >>>>>
> >>>>> role name: data
> >>>>>
> >>>>> values: [on, off]
> >>>>> default: allowed
> >>>>>
> >>>>> role name: overseer
> >>>>>
> >>>>> values: [allowed, disallowed, preferred]
> >>>>> default : allowed
> >>>>>
> >>>>> role name: coordinator
> >>>>>
> >>>>> values : [on, off]
> >>>>> default: off
> >>>>>
> >>>>>
> >>>>> examples
> >>>>> roles=data:on,overseer:allowed (This is redundant because it uses all 
> >>>>> the default values. If a node is started without any roles value this 
> >>>>> is the default behavior)
> >>>>> roles=data:off,overseer:preferred ( do not allow data, join overseer 
> >>>>> election at head)
> >>>>> roles=coordinator:on,data:on (role as coordinator, but allow data, it's 
> >>>>> same as roles=coordinator:on)
> >>>>> roles=coordinator:on,data:off (role as coordinator, disallow data)
> >>>>>
> >>>>>
> >>>>> On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <[email protected] 
> >>>>> <mailto:[email protected]>> wrote:
> >>>>>>
> >>>>>> If we go with no negative node roles and overseer node role is not 
> >>>>>> strict (i.e. it’s a "preferred overseer"), then one would need to 
> >>>>>> define a second node role "no_overseer" to explicitly exclude a node 
> >>>>>> from ever becoming overseer (which I think is a useful feature until 
> >>>>>> we switch the cluster default to not using the overseer), plus the 
> >>>>>> implementation of these two node roles will obviously be coupled (and 
> >>>>>> what if a node has both defined?).
> >>>>>>
> >>>>>> I prefer strict node roles.
> >>>>>> Maybe we could have node roles with [optional] parameters to let the 
> >>>>>> node role implementation decide ?
> >>>>>> The overseer node role for example could have one of 3 values defined 
> >>>>>> for each node: “preferred” (default, equivalent to the existing 
> >>>>>> overseer role), "accepted" (equivalent to currently not defining the 
> >>>>>> overseer role) and "no_way" (does not exist today).
> >>>>>>
> >>>>>> This could be useful in other contexts. A node role “data” could be 
> >>>>>> “fast” or “slow” depending on type of local persistent storage for 
> >>>>>> example…
> >>>>>>
> >>>>>> Ilan
> >>>>>>
> >>>>>> On Fri 3 Dec 2021 at 16:10, Gus Heck <[email protected] 
> >>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>
> >>>>>>> I really don't think we should have types of roles. Not 
> >>>>>>> negative/positive and not strict/non-strict. You have a role or you 
> >>>>>>> don't. What that means is up to the code implementing the role.
> >>>>>>>
> >>>>>>> Roles should be free to configure a preference order (binary, or 
> >>>>>>> n-ary or whatever, strict or loose), prohibit behavior, or enable 
> >>>>>>> behavior. In this SIP I feel we should focus on How to identify what 
> >>>>>>> node has what role, How to designate what roles a node has via 
> >>>>>>> config/params, and the API's for interacting with roles.
> >>>>>>>
> >>>>>>> We should for example be able to support roles such as
> >>>>>>>
> >>>>>>> PREFERRED_OVERSEER
> >>>>>>> DATA
> >>>>>>> NO_ROUTED_ALIAS  (just an example, not something I mean to suggest)
> >>>>>>>
> >>>>>>> Details about role implementation should probably be discussed in a 
> >>>>>>> thread about that role.  Obviously we should think about the name 
> >>>>>>> carefully to leave options open should we want to enhance things 
> >>>>>>> later so maybe
> >>>>>>>
> >>>>>>> OVERSEER_PREF  or just  OVERSEER
> >>>>>>>
> >>>>>>> would be better since it merely reades that the node implements some 
> >>>>>>> sort of preference or config regarding overseer... but all this can 
> >>>>>>> be decided on a per role basis
> >>>>>>>
> >>>>>>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <[email protected] 
> >>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>
> >>>>>>>> Negative roles have a place
> >>>>>>>>
> >>>>>>>> Example is overseer
> >>>>>>>>
> >>>>>>>> There are 3 possible choices for that role
> >>>>>>>>
> >>>>>>>> a) preferred: always be in front of the election queue
> >>>>>>>> b) on: not preferred, but can be an overseer if no preferred 
> >>>>>>>> overseer nodes are available
> >>>>>>>> c) off: never become an overseer
> >>>>>>>>
> >>>>>>>> Today we only have options 'a' and 'b' . In a future ticket, we may 
> >>>>>>>> implement C
> >>>>>>>>
> >>>>>>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <[email protected] 
> >>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>
> >>>>>>>>> Negative roles add a lot of complexity, I would really want to stay 
> >>>>>>>>> away from them. That’s why I want strict roles up front. It’s maybe 
> >>>>>>>>> ok to push this decision out, but it also seems like the sort of 
> >>>>>>>>> thing we should consider at the start.
> >>>>>>>>>
> >>>>>>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <[email protected] 
> >>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Yes. Negative roles is not a bad idea. If I start a node for 
> >>>>>>>>>> machine learning purposes, I wouldn't want that node to ever 
> >>>>>>>>>> participate in overseer election
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <[email protected] 
> >>>>>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> If we have non strict roles (like overseer), then it does make 
> >>>>>>>>>>> sense
> >>>>>>>>>>> to have negative roles.
> >>>>>>>>>>> That way I can define which are the two nodes that I'd prefer the
> >>>>>>>>>>> overseer to run on, and a few other nodes on which it should
> >>>>>>>>>>> definitely never run for various reasons. And in case these
> >>>>>>>>>>> "!overseer" are the only nodes left in the cluster, let the 
> >>>>>>>>>>> cluster
> >>>>>>>>>>> fail the same way it would if there were no data nodes available.
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman 
> >>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> With the Strict/Loose option and sensible defaults, users 
> >>>>>>>>>>> >>> cannot trip themselves up by default, but the option is there 
> >>>>>>>>>>> >>> for people to tinker and have an iron grip over their cluster.
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> +1 to sensible defaults so users don't trip themselves. The 
> >>>>>>>>>>> >> option to tinker for tighter grip can be tackled later, either 
> >>>>>>>>>>> >> on a per role basis or as a generic concept later.
> >>>>>>>>>>> >
> >>>>>>>>>>> >
> >>>>>>>>>>> > +1 - Can definitely be added later if we so desire, not needed 
> >>>>>>>>>>> > for this SIP
> >>>>>>>>>>> >
> >>>>>>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya 
> >>>>>>>>>>> > <[email protected] <mailto:[email protected]>> 
> >>>>>>>>>>> > wrote:
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <[email protected] 
> >>>>>>>>>>> >> <mailto:[email protected]>> wrote:
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> I think the key  is to let the roles have full control of the 
> >>>>>>>>>>> >>> implications of having/not having that role. No need for even 
> >>>>>>>>>>> >>> a strict/loose designation. The question of do you have the 
> >>>>>>>>>>> >>> role is yes/no with no logic to guess if the role is implied 
> >>>>>>>>>>> >>> or not, The question of will it come up with the role is 
> >>>>>>>>>>> >>> "have_explicit ? use_defaults : use_defaults.
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Once you figure out who has a role (or not) what that means 
> >>>>>>>>>>> >>> is up to the role code.
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Corollary: we don't have to change the way overseer works in 
> >>>>>>>>>>> >>> this SIP. We can rework it or not as we see fit separately.
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> +1
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> Only thing we need to do is find a wording that makes the 
> >>>>>>>>>>> >>> above clear on first read through the SIP :)
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> -Gus
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman 
> >>>>>>>>>>> >>> <[email protected] <mailto:[email protected]>> 
> >>>>>>>>>>> >>> wrote:
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> This doesn't really address my concern around what happens 
> >>>>>>>>>>> >>>>> if all of our existing OVERSEER candidates are down. When 
> >>>>>>>>>>> >>>>> at least one of them is up, the overseer will go there, and 
> >>>>>>>>>>> >>>>> that is good and expected. But what happens if all of the 
> >>>>>>>>>>> >>>>> overseer eligible nodes are down. Your comment, and the old 
> >>>>>>>>>>> >>>>> system, would imply that the overseer election goes to some 
> >>>>>>>>>>> >>>>> other unrelated, untagged node. I disagree with this 
> >>>>>>>>>>> >>>>> implementation choice. This sounds like something role 
> >>>>>>>>>>> >>>>> specific to determine, but I would like to see us be more 
> >>>>>>>>>>> >>>>> strict about it. I don't want cores leaking out of my data 
> >>>>>>>>>>> >>>>> roles, I don't want query processing to leak out of my 
> >>>>>>>>>>> >>>>> "query" nodes or whatever. Overseer shouldn't be special in 
> >>>>>>>>>>> >>>>> this regard.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> I'm very strongly in favor of not letting users design a 
> >>>>>>>>>>> >>>> system in which the cluster can be "live" without an 
> >>>>>>>>>>> >>>> overseer. I understand that the overseer can be taxing to 
> >>>>>>>>>>> >>>> the cluster, but honestly what is the point of having an 
> >>>>>>>>>>> >>>> untaxed cluster that doesn't have an overseer? I can see 
> >>>>>>>>>>> >>>> arguments for the other roles to be stricter about this, but 
> >>>>>>>>>>> >>>> there are also a lot of users who wouldn't want those to be 
> >>>>>>>>>>> >>>> strict either (like "query" nodes).
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> Maybe we just put in stronger guarantees that if a 
> >>>>>>>>>>> >>>> non-overseer role node HAS to be selected to become 
> >>>>>>>>>>> >>>> overseer, it will try to migrate the overseer job to a node 
> >>>>>>>>>>> >>>> with the overseer role whenever one becomes live.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> So maybe we don't have special rules per role, but instead 
> >>>>>>>>>>> >>>> roles can either be defined as "Strict" or "Loose" (better 
> >>>>>>>>>>> >>>> names likely exist), and the roles come with a default 
> >>>>>>>>>>> >>>> (Overseer -> Loose, Data -> Strict, Query -> Loose, etc.). 
> >>>>>>>>>>> >>>> And it is up to each role to define how to behave when 
> >>>>>>>>>>> >>>> running in LOOSE mode and a non-role node is used then a 
> >>>>>>>>>>> >>>> role node comes online (like the overseer example given 
> >>>>>>>>>>> >>>> above).
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> With the Strict/Loose option and sensible defaults, users 
> >>>>>>>>>>> >>>> cannot trip themselves up by default, but the option is 
> >>>>>>>>>>> >>>> there for people to tinker and have an iron grip over their 
> >>>>>>>>>>> >>>> cluster.
> >>>>>>>>>>> >>>>
> >>>>>>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <[email protected] 
> >>>>>>>>>>> >>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Noble wrote:
> >>>>>>>>>>> >>>>> > We are not modifying the way the "overseer role" works 
> >>>>>>>>>>> >>>>> > today. We are just changing the definition and 
> >>>>>>>>>>> >>>>> > standardizing the configuration & discoverability
> >>>>>>>>>>> >>>>> Ishan wrote:
> >>>>>>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER 
> >>>>>>>>>>> >>>>> > role (which currently stands for preferred overseer). We 
> >>>>>>>>>>> >>>>> > can take a stab at refactoring it later.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Grouping these two comments together, since I think they 
> >>>>>>>>>>> >>>>> are saying the same thing. I think this is part of my 
> >>>>>>>>>>> >>>>> confusion. We have an old system that doesn't work the way 
> >>>>>>>>>>> >>>>> we want the new system to work. There may be people already 
> >>>>>>>>>>> >>>>> using the old system. What path do we offer for folks using 
> >>>>>>>>>>> >>>>> the old system to migrate to the new system? What happens 
> >>>>>>>>>>> >>>>> if somebody accidentally tries to use both systems at the 
> >>>>>>>>>>> >>>>> same time?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Ishan wrote:
> >>>>>>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER 
> >>>>>>>>>>> >>>>> > role] are live, Solr guarantees that one of those nodes 
> >>>>>>>>>>> >>>>> > becomes the overseer.", I meant to somewhat capture the 
> >>>>>>>>>>> >>>>> > current behaviour as the OVERSEER role performs today. Do 
> >>>>>>>>>>> >>>>> > you see any inconsistency with this statement vs. what it 
> >>>>>>>>>>> >>>>> > does today?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> This doesn't really address my concern around what happens 
> >>>>>>>>>>> >>>>> if all of our existing OVERSEER candidates are down. When 
> >>>>>>>>>>> >>>>> at least one of them is up, the overseer will go there, and 
> >>>>>>>>>>> >>>>> that is good and expected. But what happens if all of the 
> >>>>>>>>>>> >>>>> overseer eligible nodes are down. Your comment, and the old 
> >>>>>>>>>>> >>>>> system, would imply that the overseer election goes to some 
> >>>>>>>>>>> >>>>> other unrelated, untagged node. I disagree with this 
> >>>>>>>>>>> >>>>> implementation choice. This sounds like something role 
> >>>>>>>>>>> >>>>> specific to determine, but I would like to see us be more 
> >>>>>>>>>>> >>>>> strict about it. I don't want cores leaking out of my data 
> >>>>>>>>>>> >>>>> roles, I don't want query processing to leak out of my 
> >>>>>>>>>>> >>>>> "query" nodes or whatever. Overseer shouldn't be special in 
> >>>>>>>>>>> >>>>> this regard.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> Noble wrote:
> >>>>>>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node 
> >>>>>>>>>>> >>>>> > in the following request?
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> You're absolutely correct, thanks for pointing this out. 
> >>>>>>>>>>> >>>>> Let's leave it as is.
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>>
> >>>>>>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya 
> >>>>>>>>>>> >>>>> <[email protected] 
> >>>>>>>>>>> >>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob 
> >>>>>>>>>>> >>>>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> Replying to the top post in this thread because there has 
> >>>>>>>>>>> >>>>>>> been a lot of discussion and I don't want to look like 
> >>>>>>>>>>> >>>>>>> I'm continuing any of those particular threads.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> I finally had time to sit down and think about this with 
> >>>>>>>>>>> >>>>>>> the attention it deserves and am generally happy with how 
> >>>>>>>>>>> >>>>>>> the conversation has shaped the current proposal.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: I think using system properties to define node 
> >>>>>>>>>>> >>>>>>> roles is fine and I like that data is the default role 
> >>>>>>>>>>> >>>>>>> when not defined. I think it is important to hold on to 
> >>>>>>>>>>> >>>>>>> the guarantee that an active overseer will land on an 
> >>>>>>>>>>> >>>>>>> overseer node role.
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for 
> >>>>>>>>>>> >>>>>>> folks using the current OVERSEER role. I am not sure that 
> >>>>>>>>>>> >>>>>>> something can be done automatically since they need to 
> >>>>>>>>>>> >>>>>>> now specify new properties at startup. Maybe we need to 
> >>>>>>>>>>> >>>>>>> include loud warnings or support both approaches for a 
> >>>>>>>>>>> >>>>>>> time?
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer 
> >>>>>>>>>>> >>>>>>> nodes fail, then it is implied the overseer will go to 
> >>>>>>>>>>> >>>>>>> one of the data nodes. The specific wording in the SIP - 
> >>>>>>>>>>> >>>>>>> "When one or more such nodes are live, Solr guarantees 
> >>>>>>>>>>> >>>>>>> that one of those nodes become the overseer." implies to 
> >>>>>>>>>>> >>>>>>> me that failover could go from overseer1 to overseer2 to 
> >>>>>>>>>>> >>>>>>> overseerN to random node. I feel like we need to have 
> >>>>>>>>>>> >>>>>>> some recording that there were dedicated overseer nodes 
> >>>>>>>>>>> >>>>>>> and stop the cascading failure instead of churning 
> >>>>>>>>>>> >>>>>>> through our data nodes.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed 
> >>>>>>>>>>> >>>>>>> scope of "coordinator" roles from a split query/indexing 
> >>>>>>>>>>> >>>>>>> standpoint. I understand that these are used as examples, 
> >>>>>>>>>>> >>>>>>> but would like stronger language that new roles should 
> >>>>>>>>>>> >>>>>>> also go through their own SIP discussions.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node 
> >>>>>>>>>>> >>>>>>> liveness in two different places now. We have the live 
> >>>>>>>>>>> >>>>>>> nodes and we have the node roles stored in two different 
> >>>>>>>>>>> >>>>>>> places in zookeeper and it feels like this would lead to 
> >>>>>>>>>>> >>>>>>> race conditions or split brain or other hard to diagnose 
> >>>>>>>>>>> >>>>>>> bugs when those two lists don't agree with each other. 
> >>>>>>>>>>> >>>>>>> This also feels like it contradicts the "single source of 
> >>>>>>>>>>> >>>>>>> truth" idea later stated in the proposal. I see Gus's 
> >>>>>>>>>>> >>>>>>> arguments for decoupling these and am not strongly 
> >>>>>>>>>>> >>>>>>> opposed, I just get a lurking feeling about it. Even if 
> >>>>>>>>>>> >>>>>>> we don't do this, I would like this called out explicitly 
> >>>>>>>>>>> >>>>>>> in the alternative approaches section as something that 
> >>>>>>>>>>> >>>>>>> we considered and rejected, with details why,
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an 
> >>>>>>>>>>> >>>>>>> additional call out here that all operations are GET 
> >>>>>>>>>>> >>>>>>> because nodes cannot be changed at runtime.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous 
> >>>>>>>>>>> >>>>>>> OVERSEER preference role?
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of 
> >>>>>>>>>>> >>>>>>> available roles for a cluster. I _think_ this could be 
> >>>>>>>>>>> >>>>>>> based on the version that the cluster is running? Would 
> >>>>>>>>>>> >>>>>>> be useful to be able to interrogate a cluster in the 
> >>>>>>>>>>> >>>>>>> future... we're seeing OOM issues on queries, can we add 
> >>>>>>>>>>> >>>>>>> some query nodes? When were they introduced? I don't know 
> >>>>>>>>>>> >>>>>>> what path this API should exist at.
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the 
> >>>>>>>>>>> >>>>>> SIP document. Not sure if there's a better path that we 
> >>>>>>>>>>> >>>>>> could go for.
> >>>>>>>>>>> >>>>>>
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which 
> >>>>>>>>>>> >>>>>>> parts are string literals and which parts are meant to be 
> >>>>>>>>>>> >>>>>>> substituted by the operator? GET /api/cluster/roles/data 
> >>>>>>>>>>> >>>>>>> would become GET /api/cluster/roles/${rolename} in our 
> >>>>>>>>>>> >>>>>>> SIP/documentation.
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: I think GET 
> >>>>>>>>>>> >>>>>>> /api/cluster/roles/nodes/node1 should be GET 
> >>>>>>>>>>> >>>>>>> /api/cluster/roles/${nodename} dropping the intermediate 
> >>>>>>>>>>> >>>>>>> "nodes"
> >>>>>>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that 
> >>>>>>>>>>> >>>>>>> intermediate "nodes" node.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some 
> >>>>>>>>>>> >>>>>>> permissions? Maybe this requirement is too fundamental to 
> >>>>>>>>>>> >>>>>>> the operation of a cluster and everybody would have to be 
> >>>>>>>>>>> >>>>>>> able to do it.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) 
> >>>>>>>>>>> >>>>>>> to treat roles? Implementation detail that the servers 
> >>>>>>>>>>> >>>>>>> will figure out? Or strict guidance where the client 
> >>>>>>>>>>> >>>>>>> needs to check where specific roles are before sending 
> >>>>>>>>>>> >>>>>>> any further communication to the server?
> >>>>>>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request 
> >>>>>>>>>>> >>>>>>> that it can't fulfil? An overseer node gets a query or an 
> >>>>>>>>>>> >>>>>>> update. A data node gets a collection creation request. 
> >>>>>>>>>>> >>>>>>> Do they forward it on to an appropriate node, or do they 
> >>>>>>>>>>> >>>>>>> reject it? Should this be configurable? If not, then it 
> >>>>>>>>>>> >>>>>>> seems like lazy or poorly configured clients will defeat 
> >>>>>>>>>>> >>>>>>> this isolation system quite easily.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> GOOD: Testing the API is very important, yes.
> >>>>>>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave 
> >>>>>>>>>>> >>>>>>> when roles are added mean? I thought we established that 
> >>>>>>>>>>> >>>>>>> they are not dynamic.
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> Thanks,
> >>>>>>>>>>> >>>>>>> Mike
> >>>>>>>>>>> >>>>>>>
> >>>>>>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya 
> >>>>>>>>>>> >>>>>>> <[email protected] 
> >>>>>>>>>>> >>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Hi,
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles:
> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694 
> >>>>>>>>>>> >>>>>>>> <https://issues.apache.org/jira/browse/SOLR-15694>
> >>>>>>>>>>> >>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
> >>>>>>>>>>> >>>>>>>>  
> >>>>>>>>>>> >>>>>>>> <https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles>
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> We also wish to add first class support for Query nodes 
> >>>>>>>>>>> >>>>>>>> that are used to process user queries by forwarding to 
> >>>>>>>>>>> >>>>>>>> data nodes, merging/aggregating them and presenting to 
> >>>>>>>>>>> >>>>>>>> users. This concept exists as first class citizens in 
> >>>>>>>>>>> >>>>>>>> most other search engines. This is a chance for Solr to 
> >>>>>>>>>>> >>>>>>>> catch up.
> >>>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715 
> >>>>>>>>>>> >>>>>>>> <https://issues.apache.org/jira/browse/SOLR-15715>
> >>>>>>>>>>> >>>>>>>>
> >>>>>>>>>>> >>>>>>>> Regards,
> >>>>>>>>>>> >>>>>>>> Ishan / Noble / Hitesh
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>>
> >>>>>>>>>>> >>> --
> >>>>>>>>>>> >>> http://www.needhamsoftware.com 
> >>>>>>>>>>> >>> <http://www.needhamsoftware.com/> (work)
> >>>>>>>>>>> >>> http://www.the111shift.com <http://www.the111shift.com/> 
> >>>>>>>>>>> >>> (play)
> >>>>>>>>>>>
> >>>>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: [email protected] 
> >>>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>> For additional commands, e-mail: [email protected] 
> >>>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> 
> >>>>>>> (work)
> >>>>>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -----------------------------------------------------
> >>>>> Noble Paul
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> >>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
> >>>
> >>>
> >>>
> >>> --
> >>> -----------------------------------------------------
> >>> Noble Paul
> >>
> >>
> >>
> >> --
> >> -----------------------------------------------------
> >> Noble Paul
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] 
> <mailto:[email protected]>
> For additional commands, e-mail: [email protected] 
> <mailto:[email protected]>
> 
> 
> 
> -- 
> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> http://www.the111shift.com <http://www.the111shift.com/> (play)

Re: First class support for node roles

Reply via email to