I think there are things not yet accounted for. Time I spent yesterday is biting me today. Pls give a couple days.
On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski <gerlowsk...@gmail.com> wrote: > Hey Ishan, > > I appreciate you writing up the SIP! Here's some notes/questions I > had as I was reading through your writeup and this mail thread. > ("----" separators between thoughts, hopefully that helps.) > > ---- > > I'll add my vote to what Jan, Gus, Ilan, and Houston already > suggested: roles should default to "all-on". I see the downsides > you're worried about with that approach (esp. around 'overseer'), but > they may be mitigatable, at least in part. > > > [mail thread] User wants this node Solr101 to be a dedicated overseer, > but for that to happen, he/she would need to restart all the data nodes > with -Dnode.roles=data > > Sure, if roles can only be specified at startup. But that may be a > self-imposed constraint. > > An API to change a node's roles would remove the need for a restart > and make it easy for users to affect the semantics they want. You > decided you want a dedicated overseer N nodes into your cluster > deployment? Deploy node 'N' with the 'overseer', and toggle the > overseer role off on the remainder. > > Now, I understand that you don't want roles to change at runtime, but > I haven't seen you get much into "why", beyond saying "it is very > risky to have nodes change roles while they are up and running." Can > you expand a bit on the risks you're worried about? If you're > explicit about them here maybe someone can think of a clever way to > address them? > > > Hence, if those nodes are "assumed to have all roles", then just by > virtue of upgrading to this new version, new capabilities will be turned on > for the entire cluster, whether or not the user opted for such a > capability. This is totally undesirable. > > Obviously "roles" refer to much bigger chunks of functionality than > usual, so in a sense defaulting roles on is scarier. But in a sense > you're describing something that's an inherent part of software > releases. Releases expose new features that are typically on by > default. A new default-on role in 9.1 might hurt a user, but there's > no fundamental difference between that and a change to backups or > replication or whatever in the same release. > > I don't mean to belittle the difference in scope - I get your concern. > But IMO this is something to address with good release notes and > documentation. Designing for admins who don't do even cursory > research before an upgrade ties both our hands behind our back as a > project. > > ---- > > > [SIP] Internal representation in ZK ... Implementation details like > these can be fleshed out in the PR > > IMO this is important enough to flush out as part of the SIP, at least > in broad strokes. It affects backcompat, SolrJ client design, etc. > > ---- > > > [SIP] GET /api/cluster/roles?node=node1 > > Woohoo - way to include a v2 API definition! > > AFAIR, the v2 API has a /nodes path defined - I wonder whether "GET > /nodes/someNode/roles" wouldn't be a more intuitive endpoint for the > "get the roles this node has" functionality. Though I leave that for > your consideration. > > ---- > > Looking forward to your responses and seeing the SIP progress! It's a > really cool, promising idea IMO. > > Best, > > Jason > > On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya > <ichattopadhy...@gmail.com> wrote: > > > > Are there any unaddressed outstanding concerns that we should hold up > the SIP for? > > > > On Mon, 1 Nov, 2021, 10:31 pm Ishan Chattopadhyaya, < > ichattopadhy...@gmail.com> wrote: > >>> > >>> >> Agree. However, I disagree with ideas where "query analysis" has a > role of its own. Where would that lead us to? Separate roles for > >>> > >>> >> nodes that do "faceting" or "spell correction" etc.? But anyway, > that is for discussion when we add future roles. This is beyond this SIP. > >> > >> > >> > I am not asking you to implement every possible role of course :). As > a note I know a company that is running an entire separate > >> > cluster to offload and better serve highlighting on a subset of large > docs, so YES I think there are people who may want such fine grained > control. > >> > >> Cool, I think we can discuss adding any additional roles (for > highlighting?) on a case by case basis at a later point. > >> > >> > >> On Mon, Nov 1, 2021 at 10:25 PM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > >>> > >>> > Boiling it down the idea I'm proposing is that roles required for > back compatibility get explicitly added on startup, if not by the user then > by the code. This is more flexible than assuming that no role means every > role, because then every new feature that has a role will end up on legacy > clusters which are also not back compatible. > >>> > >>> +1, I totally agree. I even said so, when I said: "This is why I was > advocating that 1) we assume the "data" as a default, 2) not assume > overseer to be implicitly defined (because of the way overseer role is > written today), 3) not assume any future roles to be true by default." > >>> > >>> So, basically, I'm proposing that the "roles required for back > compatibility" (that should be explicitly added on startup) be just the > ["data"] role, and not the "overseer" role (due to the way overseer role is > currently defined, i.e. it is "preferred overseer"). > >>> > >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <gus.h...@gmail.com> wrote: > >>>> > >>>> Very sorry don't mean to sound offended, Frustrated yes offended no > :)... the most difficult thing about communication is the illusion it has > occurred :) > >>>> > >>>> If you read back just a few emails you'll see where I talk about > roles being applied on startup. Boiling it down the idea I'm proposing is > that roles required for back compatibility get explicitly added on startup, > if not by the user then by the code. This is more flexible than assuming > that no role means every role, because then every new feature that has a > role will end up on legacy clusters which are also not back compatible. > >>>> > >>>> There are points where I said all roles rather than back > compatibility roles because I was thinking about back compatibility > specifically, but you can't know that if I don't say that can you :). > >>>> > >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > >>>>> > >>>>> > If you read more closely, my way can provide full back > compatibility. To say or imply it doesn't isn't helping. Perhaps you need > to re-read? > >>>>> > >>>>> I understand e-mails are frustrating, and I'm trying my best. Please > don't be offended, and kindly point me to the exact part you want me to > re-read. > >>>>> > >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <gus.h...@gmail.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > >>>>>>> > >>>>>>> > Positive - They denote the existence of a capability > >>>>>>> > >>>>>>> Agree, the SIP already reflects this. > >>>>>>> > >>>>>>> > Absolute - Absence/Presence binary identification of a > capability; no implications, no assumptions > >>>>>>> > >>>>>>> Disagree, we need backcompat handling on nodes running without any > roles. There has to be an implicit assumption as to what roles are those > nodes assumed to have. My proposal is that only the "data" role be assumed, > but not the "overseer" role. For any future roles ("coordinator", > "zookeeper" etc.), this decision as to what absence of any role implies > should be left to the implementation of that future role. Documentation > should reflect clearly about these implicit assumptions. > >>>>>>> > >>>>>> > >>>>>> If you read more closely, my way can provide full back > compatibility. To say or imply it doesn't isn't helping. Perhaps you need > to re-read? > >>>>>> > >>>>>>> > >>>>>>> > Focused - Do one thing per role > >>>>>>> > >>>>>>> Agree. However, I disagree with ideas where "query analysis" has a > role of its own. Where would that lead us to? Separate roles for nodes that > do "faceting" or "spell correction" etc.? But anyway, that is for > discussion when we add future roles. This is beyond this SIP. > >>>>>>> > >>>>>> > >>>>>> I am not asking you to implement every possible role of course :). > As a note I know a company that is running an entire separate cluster to > offload and better serve highlighting on a subset of large docs, so YES I > think there are people who may want such fine grained control. > >>>>>> > >>>>>>> > >>>>>>> > Accessible - It should be dead simple to determine the > members of a role, avoid parsing blobs of json, avoid calculating > implications, avoid consulting other resources after listing nodes with the > role > >>>>>>> > >>>>>>> Agree. I'm open to any implementation details that make it easy. > There should be a reasonable API to return these node roles, with ability > to filter by role or filter by node. > >>>>>>> > >>>>>>> > Independent - One role should not require other roles to be > present > >>>>>>> > >>>>>>> Do we need to have this hard and fast requirement upfront? There > might be situations where this is desirable. I feel we can discuss on a > case by case basis whenever a future role is added. > >>>>>>> > >>>>>>> > Persistent - roles should not be lost across reboot > >>>>>>> > >>>>>>> Agree. > >>>>>>> > >>>>>>> > Immutable - roles should not change while the node is running > >>>>>>> > >>>>>>> Agree > >>>>>>> > >>>>>>> > Lively - A node with a capability may not be presently > providing that capability. > >>>>>>> > >>>>>>> I don't understand, can you please elaborate? > >>>>>> > >>>>>> > >>>>>> > >>>>>> Specifically imagine the case where there are 100 nodes: > >>>>>> 1-100 ==> DATA > >>>>>> 101-103 ==> OVERSEER > >>>>>> 104-106 ==> ZOOKEEPER > >>>>>> > >>>>>> But you won't have 3 overseers... you'll want only one of those to > be providing overseer functionality and the other two to be capable, but > not providing (so that if the current overseer goes down a new one can be > assigned). > >>>>>> > >>>>>> Then you decide you'd ike 5 Zookeepers. You start nodes 107-108 > with that role, but you probably want to ensure that zookeepers require > some sort of command for them to actually join the zookeeper cluster (i.e. > /admin?action=ZKADD&nodes=node107,node18) ... to do that the nodes need to > be up. But oh look I typoed 108... we want that to fail... how? because 18 > does not have the capability to become a zookeeper. > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > >>>>>>>> > >>>>>>>> > Ilan: A node not having node.roles defined should be assumed to > have all roles. Not only data. I don't see a reason to special case this > one or any role. > >>>>>>>> > Gus: There should be no "assumptions" Nothing to figure out. A > node has a role or not. For back compatibility reasons, all roles would be > assumed on startup if none specified. > >>>>>>>> > Jan: No role == all roles. Explicit list of roles = exactly > those roles. > >>>>>>>> > >>>>>>>> Problem with this approach is mainly to do with backcompat. > >>>>>>>> > >>>>>>>> 1. Overseer backcompat: > >>>>>>>> If we don't make any modifications to how overseer works and > adopt this approach (as quoted), then imagine this situation: > >>>>>>>> > >>>>>>>> Solr1-100: No roles param (assumed to be "data,overseer"). > >>>>>>>> Solr101: -Dnode.roles=overseer (intention: dedicated overseer) > >>>>>>>> > >>>>>>>> User wants this node Solr101 to be a dedicated overseer, but for > that to happen, he/she would need to restart all the data nodes with > -Dnode.roles=data. This will cause unnecessary disruption to running > clusters where a dedicated overseer is needed. Keep in mind, if a user > needs a dedicated overseer, he's likely in an emergency situation and > restarting the whole cluster might not be viable for him/her. > >>>>>>>> > >>>>>>>> 2. Future roles might not be compatible with this "assumed to > have all roles" idea: > >>>>>>>> Take the proposed "zookeeper" role for example. Today, regular > nodes are not supposed to have embedded ZK running on them. By introducing > this artificial limitation ("assumed to have all roles"), we constrain > adoption of all future roles to necessarily require a full cluster restart. > >>>>>>>> > >>>>>>>> Keep in mind newer Solr versions can introduce new capabilities > and roles. Imagine we have a role that is defined in a new Solr version > (and there's functionality to go with that role), and user upgrades to that > version. However, his/her nodes all were started with no node.roles param. > Hence, if those nodes are "assumed to have all roles", then just by virtue > of upgrading to this new version, new capabilities will be turned on for > the entire cluster, whether or not the user opted for such a capability. > This is totally undesirable. > >>>>>>>> > >>>>>>>> > Gus: I actually don't want a coordinator to do more work, I > would prefer small focused roles with names that accurately describe their > function. In that light, COORDINATOR might be too nebulous. How about > AGREGATOR role? (what I was thinking of would better be called a > QUERY_ANALYSIS role) > >>>>>>>> > >>>>>>>> If you want to do specific things like query analysis or query > aggregation or bulk indexing etc, all of those can be done on COORDINATOR > nodes (as is the case in ElasticSearch). Having tens of of " small focused > roles" defined as first class concepts would be confusing to the user. As a > remedy to your situation where you want the coordinator role to also do > query-analysis for shards, one possible solution is to send such a query to > a coordinator node with a parameter like "coordinator.query_analysis=true", > and then the coordinator, instead of blindly hitting remote shards, also > does some extra work on behalf of the shards. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> > If we make collections role-aware for example (replicas of > that collection can only be > >>>>>>>>> > placed on nodes with a specific role, in addition to the other > role based constraints), > >>>>>>>>> > the set of roles should be user extensible and not fixed. > >>>>>>>>> > If collections are not role aware, the constraints introduced > by roles apply to all collections > >>>>>>>>> > equally which might be insufficient if a user needs for > example a heavily used collection to > >>>>>>>>> > only be placed on more powerful nodes. > >>>>>>>>> > >>>>>>>>> I feel node roles and role-aware collections are orthogonal > topics. What you describe above can be achieved by the autoscaling+replica > placement framework where the placement plugins take the node roles as one > of the inputs. > >>>>>>>>> > >>>>>>>>> > It does impact the design from early on: the set of roles need > to be expandable by a user > >>>>>>>>> > by creating a collection with new roles for example (consumed > by placement plugins) and be > >>>>>>>>> > able to start nodes with new (arbitrary) roles. Should such > roles follow some naming syntax to > >>>>>>>>> > differentiate them from built in roles? To be able to fail on > typos on roles - that otherwise can be > >>>>>>>>> > crippling and hard to debug. This implies in any case that the > current design can't assume all > >>>>>>>>> > roles are known at compile time or define them in a Java enum. > >>>>>>>>> > >>>>>>>>> I think this should be achieved by something different from > roles. Something like node labels (user defined) which can then be used in > a replica placement plugin to assign replicas. I see roles as more closely > associated with kinds of functionality a node is designated for. Therefore, > I feel that replica placements and user defined node labels is out of scope > for this SIP. It can be added later in a separate SIP, without being at > odds with this proposal. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl < > jan....@cominvent.com> wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg < > ilans...@gmail.com>: > >>>>>>>>>> > A node not having node.roles defined should be assumed to > have all roles. Not only data. I don't see a reason to special case this > one or any role. > >>>>>>>>>> > >>>>>>>>>> +1, make it simple and transparent. No role == all roles. > Explicit list of roles = exactly those roles. > >>>>>>>>>> > >>>>>>>>>> > (Gus) See my comment above, but maybe preference is something > handled as a feature of the role rather than via role designation? > >>>>>>>>>> > >>>>>>>>>> Yea, we always need an overseer, so that feature can decide to > use its list of nodes as a preference if it so chooses. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Aside: I think it makes it easier if we always prefix Solr > env.vars and sys.props with "SOLR_" or "solr.", i.e. -Dsolr.node.roles=foo. > That way we can get away from having to have explicit code in bin/solr, > bin/solr.cmd and SolrCLI to manage every single property. Instead we can > parse all ENVs and Props with the solr prefix in our bootstrap code. And we > can by convention allow e.g. docker run -e SOLR_NODE_ROLES=foo solr:9 and > it would be the same ting... > >>>>>>>>>> > >>>>>>>>>> Jan > >>>>>>>>>> > --------------------------------------------------------------------- > >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > >>>>>>>>>> For additional commands, e-mail: dev-h...@solr.apache.org > >>>>>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> http://www.needhamsoftware.com (work) > >>>>>> http://www.the111shift.com (play) > >>>> > >>>> > >>>> > >>>> -- > >>>> http://www.needhamsoftware.com (work) > >>>> http://www.the111shift.com (play) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > For additional commands, e-mail: dev-h...@solr.apache.org > > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)