typo
- On Sun, Dec 5, 2021 at 2:37 PM Noble Paul <noble.p...@gmail.com> wrote: > I recommend the following format for the role spec > > roles=<role-name>:<role-value> > > each role will have an enum of allowed values and a default value > > > - role name: *data* > - values: [*on*, *off]* > - default: *allowed* > > - default: *on* > - role name: *overseer* > - values: [*allowed*, *disallowed*, *preferred]* > - default : *allowed* > - role name:* coordinator* > - values : [*on*, *off]* > - default: *off* > > > examples > roles=data:on,overseer:allowed (This is redundant because it uses all the > default values. If a node is started without any roles value this is the > default behavior) > roles=data:off,overseer:preferred ( do not allow data, join overseer > election at head) > roles=coordinator:on,data:on (role as coordinator, but allow data, it's > same as roles=coordinator:on) > roles=coordinator:on,data:off (role as coordinator, disallow data) > > > On Sun, Dec 5, 2021 at 11:01 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > >> If we go with no negative node roles and overseer node role is not strict >> (i.e. it’s a "preferred overseer"), then one would need to define a second >> node role "no_overseer" to explicitly exclude a node from ever becoming >> overseer (which I think is a useful feature until we switch the cluster >> default to not using the overseer), plus the implementation of these two >> node roles will obviously be coupled (and what if a node has both defined?). >> >> I prefer strict node roles. >> Maybe we could have node roles with [optional] parameters to let the node >> role implementation decide ? >> The overseer node role for example could have one of 3 values defined for >> each node: “preferred” (default, equivalent to the existing overseer role), >> "accepted" (equivalent to currently not defining the overseer role) and >> "no_way" (does not exist today). >> >> This could be useful in other contexts. A node role “data” could be >> “fast” or “slow” depending on type of local persistent storage for example… >> >> Ilan >> >> On Fri 3 Dec 2021 at 16:10, Gus Heck <gus.h...@gmail.com> wrote: >> >>> I really don't think we should have types of roles. Not >>> negative/positive and not strict/non-strict. You have a role or you don't. >>> What that means is up to the code implementing the role. >>> >>> Roles should be free to configure a preference order (binary, or n-ary >>> or whatever, strict or loose), prohibit behavior, or enable behavior. In >>> this SIP I feel we should focus on How to identify what node has what role, >>> How to designate what roles a node has via config/params, and the API's for >>> interacting with roles. >>> >>> We should for example be able to support roles such as >>> >>> PREFERRED_OVERSEER >>> DATA >>> NO_ROUTED_ALIAS (just an example, not something I mean to suggest) >>> >>> Details about role implementation should probably be discussed in a >>> thread about that role. Obviously we should think about the name carefully >>> to leave options open should we want to enhance things later so maybe >>> >>> OVERSEER_PREF or just OVERSEER >>> >>> would be better since it merely reades that the node implements some >>> sort of preference or config regarding overseer... but all this can be >>> decided on a per role basis >>> >>> On Thu, Dec 2, 2021 at 11:44 PM Noble Paul <noble.p...@gmail.com> wrote: >>> >>>> Negative roles have a place >>>> >>>> Example is overseer >>>> >>>> There are 3 possible choices for that role >>>> >>>> a) preferred: always be in front of the election queue >>>> b) on: not preferred, but can be an overseer if no preferred overseer >>>> nodes are available >>>> c) off: never become an overseer >>>> >>>> Today we only have options 'a' and 'b' . In a future ticket, we may >>>> implement C >>>> >>>> On Fri, Dec 3, 2021, 11:59 AM Mike Drob <md...@mdrob.com> wrote: >>>> >>>>> Negative roles add a lot of complexity, I would really want to stay >>>>> away from them. That’s why I want strict roles up front. It’s maybe ok to >>>>> push this decision out, but it also seems like the sort of thing we should >>>>> consider at the start. >>>>> >>>>> On Thu, Dec 2, 2021 at 5:52 PM Noble Paul <noble.p...@gmail.com> >>>>> wrote: >>>>> >>>>>> Yes. Negative roles is not a bad idea. If I start a node for >>>>>> machine learning purposes, I wouldn't want that node to ever participate >>>>>> in >>>>>> overseer election >>>>>> >>>>>> On Fri, Dec 3, 2021, 6:50 AM Ilan Ginzburg <ilans...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> If we have non strict roles (like overseer), then it does make sense >>>>>>> to have negative roles. >>>>>>> That way I can define which are the two nodes that I'd prefer the >>>>>>> overseer to run on, and a few other nodes on which it should >>>>>>> definitely never run for various reasons. And in case these >>>>>>> "!overseer" are the only nodes left in the cluster, let the cluster >>>>>>> fail the same way it would if there were no data nodes available. >>>>>>> >>>>>>> On Thu, Dec 2, 2021 at 5:11 PM Houston Putman < >>>>>>> houstonput...@gmail.com> wrote: >>>>>>> >>> >>>>>>> >>> With the Strict/Loose option and sensible defaults, users cannot >>>>>>> trip themselves up by default, but the option is there for people to >>>>>>> tinker >>>>>>> and have an iron grip over their cluster. >>>>>>> >> >>>>>>> >> >>>>>>> >> +1 to sensible defaults so users don't trip themselves. The >>>>>>> option to tinker for tighter grip can be tackled later, either on a per >>>>>>> role basis or as a generic concept later. >>>>>>> > >>>>>>> > >>>>>>> > +1 - Can definitely be added later if we so desire, not needed for >>>>>>> this SIP >>>>>>> > >>>>>>> > On Wed, Dec 1, 2021 at 9:14 PM Ishan Chattopadhyaya < >>>>>>> ichattopadhy...@gmail.com> wrote: >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> On Thu, Dec 2, 2021 at 1:31 AM Gus Heck <gus.h...@gmail.com> >>>>>>> wrote: >>>>>>> >>> >>>>>>> >>> I think the key is to let the roles have full control of the >>>>>>> implications of having/not having that role. No need for even a >>>>>>> strict/loose designation. The question of do you have the role is yes/no >>>>>>> with no logic to guess if the role is implied or not, The question of >>>>>>> will >>>>>>> it come up with the role is "have_explicit ? use_defaults : >>>>>>> use_defaults. >>>>>>> >>> >>>>>>> >>> Once you figure out who has a role (or not) what that means is >>>>>>> up to the role code. >>>>>>> >>> >>>>>>> >>> Corollary: we don't have to change the way overseer works in >>>>>>> this SIP. We can rework it or not as we see fit separately. >>>>>>> >> >>>>>>> >> >>>>>>> >> +1 >>>>>>> >> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Only thing we need to do is find a wording that makes the above >>>>>>> clear on first read through the SIP :) >>>>>>> >>> >>>>>>> >>> -Gus >>>>>>> >>> >>>>>>> >>> On Wed, Dec 1, 2021 at 2:50 PM Houston Putman < >>>>>>> houstonput...@gmail.com> wrote: >>>>>>> >>>>> >>>>>>> >>>>> This doesn't really address my concern around what happens if >>>>>>> all of our existing OVERSEER candidates are down. When at least one of >>>>>>> them >>>>>>> is up, the overseer will go there, and that is good and expected. But >>>>>>> what >>>>>>> happens if all of the overseer eligible nodes are down. Your comment, >>>>>>> and >>>>>>> the old system, would imply that the overseer election goes to some >>>>>>> other >>>>>>> unrelated, untagged node. I disagree with this implementation choice. >>>>>>> This >>>>>>> sounds like something role specific to determine, but I would like to >>>>>>> see >>>>>>> us be more strict about it. I don't want cores leaking out of my data >>>>>>> roles, I don't want query processing to leak out of my "query" nodes or >>>>>>> whatever. Overseer shouldn't be special in this regard. >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> I'm very strongly in favor of not letting users design a system >>>>>>> in which the cluster can be "live" without an overseer. I understand >>>>>>> that >>>>>>> the overseer can be taxing to the cluster, but honestly what is the >>>>>>> point >>>>>>> of having an untaxed cluster that doesn't have an overseer? I can see >>>>>>> arguments for the other roles to be stricter about this, but there are >>>>>>> also >>>>>>> a lot of users who wouldn't want those to be strict either (like "query" >>>>>>> nodes). >>>>>>> >>>> >>>>>>> >>>> Maybe we just put in stronger guarantees that if a non-overseer >>>>>>> role node HAS to be selected to become overseer, it will try to migrate >>>>>>> the >>>>>>> overseer job to a node with the overseer role whenever one becomes live. >>>>>>> >>>> >>>>>>> >>>> So maybe we don't have special rules per role, but instead >>>>>>> roles can either be defined as "Strict" or "Loose" (better names likely >>>>>>> exist), and the roles come with a default (Overseer -> Loose, Data -> >>>>>>> Strict, Query -> Loose, etc.). And it is up to each role to define how >>>>>>> to >>>>>>> behave when running in LOOSE mode and a non-role node is used then a >>>>>>> role >>>>>>> node comes online (like the overseer example given above). >>>>>>> >>>> >>>>>>> >>>> With the Strict/Loose option and sensible defaults, users >>>>>>> cannot trip themselves up by default, but the option is there for >>>>>>> people to >>>>>>> tinker and have an iron grip over their cluster. >>>>>>> >>>> >>>>>>> >>>> On Wed, Dec 1, 2021 at 2:24 PM Mike Drob <md...@mdrob.com> >>>>>>> wrote: >>>>>>> >>>>> >>>>>>> >>>>> Noble wrote: >>>>>>> >>>>> > We are not modifying the way the "overseer role" works >>>>>>> today. We are just changing the definition and standardizing the >>>>>>> configuration & discoverability >>>>>>> >>>>> Ishan wrote: >>>>>>> >>>>> > As of this SIP, we're not planning to modify the OVERSEER >>>>>>> role (which currently stands for preferred overseer). We can take a >>>>>>> stab at >>>>>>> refactoring it later. >>>>>>> >>>>> >>>>>>> >>>>> Grouping these two comments together, since I think they are >>>>>>> saying the same thing. I think this is part of my confusion. We have an >>>>>>> old >>>>>>> system that doesn't work the way we want the new system to work. There >>>>>>> may >>>>>>> be people already using the old system. What path do we offer for folks >>>>>>> using the old system to migrate to the new system? What happens if >>>>>>> somebody >>>>>>> accidentally tries to use both systems at the same time? >>>>>>> >>>>> >>>>>>> >>>>> Ishan wrote: >>>>>>> >>>>> > When I wrote "When one or more such nodes [with OVERSEER >>>>>>> role] are live, Solr guarantees that one of those nodes becomes the >>>>>>> overseer.", I meant to somewhat capture the current behaviour as the >>>>>>> OVERSEER role performs today. Do you see any inconsistency with this >>>>>>> statement vs. what it does today? >>>>>>> >>>>> >>>>>>> >>>>> This doesn't really address my concern around what happens if >>>>>>> all of our existing OVERSEER candidates are down. When at least one of >>>>>>> them >>>>>>> is up, the overseer will go there, and that is good and expected. But >>>>>>> what >>>>>>> happens if all of the overseer eligible nodes are down. Your comment, >>>>>>> and >>>>>>> the old system, would imply that the overseer election goes to some >>>>>>> other >>>>>>> unrelated, untagged node. I disagree with this implementation choice. >>>>>>> This >>>>>>> sounds like something role specific to determine, but I would like to >>>>>>> see >>>>>>> us be more strict about it. I don't want cores leaking out of my data >>>>>>> roles, I don't want query processing to leak out of my "query" nodes or >>>>>>> whatever. Overseer shouldn't be special in this regard. >>>>>>> >>>>> >>>>>>> >>>>> Noble wrote: >>>>>>> >>>>> > If we do that how do we know if xyz is a role or a node in >>>>>>> the following request? >>>>>>> >>>>> >>>>>>> >>>>> You're absolutely correct, thanks for pointing this out. Let's >>>>>>> leave it as is. >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> On Tue, Nov 30, 2021 at 2:21 PM Ishan Chattopadhyaya < >>>>>>> ichattopadhy...@gmail.com> wrote: >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> On Tue, Nov 30, 2021 at 12:53 AM Mike Drob <md...@mdrob.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Replying to the top post in this thread because there has >>>>>>> been a lot of discussion and I don't want to look like I'm continuing >>>>>>> any >>>>>>> of those particular threads. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I finally had time to sit down and think about this with the >>>>>>> attention it deserves and am generally happy with how the conversation >>>>>>> has >>>>>>> shaped the current proposal. >>>>>>> >>>>>>> >>>>>>> >>>>>>> GOOD: I think using system properties to define node roles >>>>>>> is fine and I like that data is the default role when not defined. I >>>>>>> think >>>>>>> it is important to hold on to the guarantee that an active overseer will >>>>>>> land on an overseer node role. >>>>>>> >>>>>>> CHANGE REQUEST: I would like to see a migration path for >>>>>>> folks using the current OVERSEER role. I am not sure that something can >>>>>>> be >>>>>>> done automatically since they need to now specify new properties at >>>>>>> startup. Maybe we need to include loud warnings or support both >>>>>>> approaches >>>>>>> for a time? >>>>>>> >>>>>>> CHANGE REQUEST: I do not like that if all of the overseer >>>>>>> nodes fail, then it is implied the overseer will go to one of the data >>>>>>> nodes. The specific wording in the SIP - "When one or more such nodes >>>>>>> are >>>>>>> live, Solr guarantees that one of those nodes become the overseer." >>>>>>> implies >>>>>>> to me that failover could go from overseer1 to overseer2 to overseerN to >>>>>>> random node. I feel like we need to have some recording that there were >>>>>>> dedicated overseer nodes and stop the cascading failure instead of >>>>>>> churning >>>>>>> through our data nodes. >>>>>>> >>>>>>> >>>>>>> >>>>>>> CLARIFICATION: I am slightly confused by the proposed scope >>>>>>> of "coordinator" roles from a split query/indexing standpoint. I >>>>>>> understand >>>>>>> that these are used as examples, but would like stronger language that >>>>>>> new >>>>>>> roles should also go through their own SIP discussions. >>>>>>> >>>>>>> >>>>>>> >>>>>>> CLARIFICATION: I do not like that we are storing node >>>>>>> liveness in two different places now. We have the live nodes and we have >>>>>>> the node roles stored in two different places in zookeeper and it feels >>>>>>> like this would lead to race conditions or split brain or other hard to >>>>>>> diagnose bugs when those two lists don't agree with each other. This >>>>>>> also >>>>>>> feels like it contradicts the "single source of truth" idea later >>>>>>> stated in >>>>>>> the proposal. I see Gus's arguments for decoupling these and am not >>>>>>> strongly opposed, I just get a lurking feeling about it. Even if we >>>>>>> don't >>>>>>> do this, I would like this called out explicitly in the alternative >>>>>>> approaches section as something that we considered and rejected, with >>>>>>> details why, >>>>>>> >>>>>>> >>>>>>> >>>>>>> GOOD: The API looks pretty clear. I would like an additional >>>>>>> call out here that all operations are GET because nodes cannot be >>>>>>> changed >>>>>>> at runtime. >>>>>>> >>>>>>> CLARIFICATION: How does this interact with the previous >>>>>>> OVERSEER preference role? >>>>>>> >>>>>>> CHANGE REQUEST: An additional API to get the list of >>>>>>> available roles for a cluster. I _think_ this could be based on the >>>>>>> version >>>>>>> that the cluster is running? Would be useful to be able to interrogate a >>>>>>> cluster in the future... we're seeing OOM issues on queries, can we add >>>>>>> some query nodes? When were they introduced? I don't know what path this >>>>>>> API should exist at. >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> Added a GET /api/cluster/roles/supported API, updated the SIP >>>>>>> document. Not sure if there's a better path that we could go for. >>>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> CLARIFICATION: Can we list the APIs to clearly show which >>>>>>> parts are string literals and which parts are meant to be substituted by >>>>>>> the operator? GET /api/cluster/roles/data would become GET >>>>>>> /api/cluster/roles/${rolename} in our SIP/documentation. >>>>>>> >>>>>>> CHANGE REQUEST: I think GET /api/cluster/roles/nodes/node1 >>>>>>> should be GET /api/cluster/roles/${nodename} dropping the intermediate >>>>>>> "nodes" >>>>>>> >>>>>>> CHANGE REQUEST: The ZK structure also might not need that >>>>>>> intermediate "nodes" node. >>>>>>> >>>>>>> >>>>>>> >>>>>>> CLARIFICATION: Should listing roles require some >>>>>>> permissions? Maybe this requirement is too fundamental to the operation >>>>>>> of >>>>>>> a cluster and everybody would have to be able to do it. >>>>>>> >>>>>>> CLARIFICATION: How do we expect SolrJ (and other clients) to >>>>>>> treat roles? Implementation detail that the servers will figure out? Or >>>>>>> strict guidance where the client needs to check where specific roles are >>>>>>> before sending any further communication to the server? >>>>>>> >>>>>>> CLARIFICATION: What happens when a node gets a request that >>>>>>> it can't fulfil? An overseer node gets a query or an update. A data node >>>>>>> gets a collection creation request. Do they forward it on to an >>>>>>> appropriate >>>>>>> node, or do they reject it? Should this be configurable? If not, then it >>>>>>> seems like lazy or poorly configured clients will defeat this isolation >>>>>>> system quite easily. >>>>>>> >>>>>>> >>>>>>> >>>>>>> GOOD: Testing the API is very important, yes. >>>>>>> >>>>>>> CLARIFICATION: What does testing for how nodes behave when >>>>>>> roles are added mean? I thought we established that they are not >>>>>>> dynamic. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Mike >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 2:17 AM Ishan Chattopadhyaya < >>>>>>> ichattopadhy...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> Here's an SIP for introducing the concept of node roles: >>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694 >>>>>>> >>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> We also wish to add first class support for Query nodes >>>>>>> that are used to process user queries by forwarding to data nodes, >>>>>>> merging/aggregating them and presenting to users. This concept exists as >>>>>>> first class citizens in most other search engines. This is a chance for >>>>>>> Solr to catch up. >>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715 >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> Regards, >>>>>>> >>>>>>>> Ishan / Noble / Hitesh >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> -- >>>>>>> >>> http://www.needhamsoftware.com (work) >>>>>>> >>> http://www.the111shift.com (play) >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org >>>>>>> For additional commands, e-mail: dev-h...@solr.apache.org >>>>>>> >>>>>>> >>> >>> -- >>> http://www.needhamsoftware.com (work) >>> http://www.the111shift.com (play) >>> >> > > -- > ----------------------------------------------------- > Noble Paul > -- ----------------------------------------------------- Noble Paul