On Wed, Nov 3, 2021 at 8:52 AM Timothy Potter <thelabd...@gmail.com> wrote: > > I'm just not convinced this feature is even needed and the SIP is not > convincing that "There is no proper alternative today." > > 1) Just b/c Elastic and Vespa have a concept of node roles, doesn't > mean Solr needs this. Also, some of Elastic's roles overlap with > concepts Solr already has in a different form, i.e data_hot sounds > like NRT and data_warm sounds a lot like our Pull Replica Type
This feature was not built because ES has it. It is built for a specific purpose > > 2) You can achieve the "coordinator" role with auto-scaling rules > pre-9.x and with the AffinityPlacementPlugin (heck, it even has a node > type built in: > .requestNodeSystemProperty(AffinityPlacementConfig.NODE_TYPE_SYSPROP). > Simply build your replica placement rules such that no replicas land > on "coordinator" nodes. And you can route queries using node.sysprop > already using shards.preference. The objective is somewhat different. Replica placement is just one small part of it. When there are 1000's of shards for a collection, and there are 100's of collections, the distributed query becomes very resource intensive operation . Our data nodes go out of memory often. We want to ensure that certain nodes have special capabilities to process requests to any collection/shard without hosting those collections or shards > > 3) Dedicated overseer role? I thought we were removing the overseer?!? > Also, we already have the ability to run the overseer on specific > nodes w/o a new framework, so this doesn't really convince me we need > a new framework. > > 4) We will indeed need to decide which nodes host embedded Zookeeper's > but I'd argue that solution hasn't been designed entirely and we > probably don't need a formal node role framework to determine which > nodes host embedded ZKs. Moreover, embedded ZK seems more like a small > cluster thing and anyone running a large cluster will probably have a > dedicated ZK ensemble as they do today. The node role thing seems like > it's intended for large clusters and my gut says few will use embedded > ZK for large clusters. > > 5) You can also achieve a lot of "node role" functionality in query > routing using the shards.preference parameter. Routing is not a problem for us > > At the very least, the SIP needs to list specific use cases that > require this feature that are not achievable with the current features > before getting bogged down in the impl. details. Sure TIm, We can document our concrete use cases > > Tim > > On Tue, Nov 2, 2021 at 3:20 PM Gus Heck <gus.h...@gmail.com> wrote: > > > > I think there are things not yet accounted for. Time I spent yesterday is > > biting me today. Pls give a couple days. > > > > On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski <gerlowsk...@gmail.com> > > wrote: > >> > >> Hey Ishan, > >> > >> I appreciate you writing up the SIP! Here's some notes/questions I > >> had as I was reading through your writeup and this mail thread. > >> ("----" separators between thoughts, hopefully that helps.) > >> > >> ---- > >> > >> I'll add my vote to what Jan, Gus, Ilan, and Houston already > >> suggested: roles should default to "all-on". I see the downsides > >> you're worried about with that approach (esp. around 'overseer'), but > >> they may be mitigatable, at least in part. > >> > >> > [mail thread] User wants this node Solr101 to be a dedicated overseer, > >> > but for that to happen, he/she would need to restart all the data nodes > >> > with -Dnode.roles=data > >> > >> Sure, if roles can only be specified at startup. But that may be a > >> self-imposed constraint. > >> > >> An API to change a node's roles would remove the need for a restart > >> and make it easy for users to affect the semantics they want. You > >> decided you want a dedicated overseer N nodes into your cluster > >> deployment? Deploy node 'N' with the 'overseer', and toggle the > >> overseer role off on the remainder. > >> > >> Now, I understand that you don't want roles to change at runtime, but > >> I haven't seen you get much into "why", beyond saying "it is very > >> risky to have nodes change roles while they are up and running." Can > >> you expand a bit on the risks you're worried about? If you're > >> explicit about them here maybe someone can think of a clever way to > >> address them? > >> > >> > Hence, if those nodes are "assumed to have all roles", then just by > >> > virtue of upgrading to this new version, new capabilities will be turned > >> > on for the entire cluster, whether or not the user opted for such a > >> > capability. This is totally undesirable. > >> > >> Obviously "roles" refer to much bigger chunks of functionality than > >> usual, so in a sense defaulting roles on is scarier. But in a sense > >> you're describing something that's an inherent part of software > >> releases. Releases expose new features that are typically on by > >> default. A new default-on role in 9.1 might hurt a user, but there's > >> no fundamental difference between that and a change to backups or > >> replication or whatever in the same release. > >> > >> I don't mean to belittle the difference in scope - I get your concern. > >> But IMO this is something to address with good release notes and > >> documentation. Designing for admins who don't do even cursory > >> research before an upgrade ties both our hands behind our back as a > >> project. > >> > >> ---- > >> > >> > [SIP] Internal representation in ZK ... Implementation details like > >> > these can be fleshed out in the PR > >> > >> IMO this is important enough to flush out as part of the SIP, at least > >> in broad strokes. It affects backcompat, SolrJ client design, etc. > >> > >> ---- > >> > >> > [SIP] GET /api/cluster/roles?node=node1 > >> > >> Woohoo - way to include a v2 API definition! > >> > >> AFAIR, the v2 API has a /nodes path defined - I wonder whether "GET > >> /nodes/someNode/roles" wouldn't be a more intuitive endpoint for the > >> "get the roles this node has" functionality. Though I leave that for > >> your consideration. > >> > >> ---- > >> > >> Looking forward to your responses and seeing the SIP progress! It's a > >> really cool, promising idea IMO. > >> > >> Best, > >> > >> Jason > >> > >> On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya > >> <ichattopadhy...@gmail.com> wrote: > >> > > >> > Are there any unaddressed outstanding concerns that we should hold up > >> > the SIP for? > >> > > >> > On Mon, 1 Nov, 2021, 10:31 pm Ishan Chattopadhyaya, > >> > <ichattopadhy...@gmail.com> wrote: > >> >>> > >> >>> >> Agree. However, I disagree with ideas where "query analysis" has a > >> >>> >> role of its own. Where would that lead us to? Separate roles for > >> >>> > >> >>> >> nodes that do "faceting" or "spell correction" etc.? But anyway, > >> >>> >> that is for discussion when we add future roles. This is beyond > >> >>> >> this SIP. > >> >> > >> >> > >> >> > I am not asking you to implement every possible role of course :). As > >> >> > a note I know a company that is running an entire separate > >> >> > cluster to offload and better serve highlighting on a subset of large > >> >> > docs, so YES I think there are people who may want such fine grained > >> >> > control. > >> >> > >> >> Cool, I think we can discuss adding any additional roles (for > >> >> highlighting?) on a case by case basis at a later point. > >> >> > >> >> > >> >> On Mon, Nov 1, 2021 at 10:25 PM Ishan Chattopadhyaya > >> >> <ichattopadhy...@gmail.com> wrote: > >> >>> > >> >>> > Boiling it down the idea I'm proposing is that roles required for > >> >>> > back compatibility get explicitly added on startup, if not by the > >> >>> > user then by the code. This is more flexible than assuming that no > >> >>> > role means every role, because then every new feature that has a > >> >>> > role will end up on legacy clusters which are also not back > >> >>> > compatible. > >> >>> > >> >>> +1, I totally agree. I even said so, when I said: "This is why I was > >> >>> advocating that 1) we assume the "data" as a default, 2) not assume > >> >>> overseer to be implicitly defined (because of the way overseer role is > >> >>> written today), 3) not assume any future roles to be true by default." > >> >>> > >> >>> So, basically, I'm proposing that the "roles required for back > >> >>> compatibility" (that should be explicitly added on startup) be just > >> >>> the ["data"] role, and not the "overseer" role (due to the way > >> >>> overseer role is currently defined, i.e. it is "preferred overseer"). > >> >>> > >> >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <gus.h...@gmail.com> wrote: > >> >>>> > >> >>>> Very sorry don't mean to sound offended, Frustrated yes offended no > >> >>>> :)... the most difficult thing about communication is the illusion it > >> >>>> has occurred :) > >> >>>> > >> >>>> If you read back just a few emails you'll see where I talk about > >> >>>> roles being applied on startup. Boiling it down the idea I'm > >> >>>> proposing is that roles required for back compatibility get > >> >>>> explicitly added on startup, if not by the user then by the code. > >> >>>> This is more flexible than assuming that no role means every role, > >> >>>> because then every new feature that has a role will end up on legacy > >> >>>> clusters which are also not back compatible. > >> >>>> > >> >>>> There are points where I said all roles rather than back > >> >>>> compatibility roles because I was thinking about back compatibility > >> >>>> specifically, but you can't know that if I don't say that can you :). > >> >>>> > >> >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan Chattopadhyaya > >> >>>> <ichattopadhy...@gmail.com> wrote: > >> >>>>> > >> >>>>> > If you read more closely, my way can provide full back > >> >>>>> > compatibility. To say or imply it doesn't isn't helping. Perhaps > >> >>>>> > you need to re-read? > >> >>>>> > >> >>>>> I understand e-mails are frustrating, and I'm trying my best. Please > >> >>>>> don't be offended, and kindly point me to the exact part you want me > >> >>>>> to re-read. > >> >>>>> > >> >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <gus.h...@gmail.com> wrote: > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan Chattopadhyaya > >> >>>>>> <ichattopadhy...@gmail.com> wrote: > >> >>>>>>> > >> >>>>>>> > Positive - They denote the existence of a capability > >> >>>>>>> > >> >>>>>>> Agree, the SIP already reflects this. > >> >>>>>>> > >> >>>>>>> > Absolute - Absence/Presence binary identification of a > >> >>>>>>> > capability; no implications, no assumptions > >> >>>>>>> > >> >>>>>>> Disagree, we need backcompat handling on nodes running without any > >> >>>>>>> roles. There has to be an implicit assumption as to what roles are > >> >>>>>>> those nodes assumed to have. My proposal is that only the "data" > >> >>>>>>> role be assumed, but not the "overseer" role. For any future roles > >> >>>>>>> ("coordinator", "zookeeper" etc.), this decision as to what > >> >>>>>>> absence of any role implies should be left to the implementation > >> >>>>>>> of that future role. Documentation should reflect clearly about > >> >>>>>>> these implicit assumptions. > >> >>>>>>> > >> >>>>>> > >> >>>>>> If you read more closely, my way can provide full back > >> >>>>>> compatibility. To say or imply it doesn't isn't helping. Perhaps > >> >>>>>> you need to re-read? > >> >>>>>> > >> >>>>>>> > >> >>>>>>> > Focused - Do one thing per role > >> >>>>>>> > >> >>>>>>> Agree. However, I disagree with ideas where "query analysis" has a > >> >>>>>>> role of its own. Where would that lead us to? Separate roles for > >> >>>>>>> nodes that do "faceting" or "spell correction" etc.? But anyway, > >> >>>>>>> that is for discussion when we add future roles. This is beyond > >> >>>>>>> this SIP. > >> >>>>>>> > >> >>>>>> > >> >>>>>> I am not asking you to implement every possible role of course :). > >> >>>>>> As a note I know a company that is running an entire separate > >> >>>>>> cluster to offload and better serve highlighting on a subset of > >> >>>>>> large docs, so YES I think there are people who may want such fine > >> >>>>>> grained control. > >> >>>>>> > >> >>>>>>> > >> >>>>>>> > Accessible - It should be dead simple to determine the > >> >>>>>>> > members of a role, avoid parsing blobs of json, avoid > >> >>>>>>> > calculating implications, avoid consulting other resources after > >> >>>>>>> > listing nodes with the role > >> >>>>>>> > >> >>>>>>> Agree. I'm open to any implementation details that make it easy. > >> >>>>>>> There should be a reasonable API to return these node roles, with > >> >>>>>>> ability to filter by role or filter by node. > >> >>>>>>> > >> >>>>>>> > Independent - One role should not require other roles to be > >> >>>>>>> > present > >> >>>>>>> > >> >>>>>>> Do we need to have this hard and fast requirement upfront? There > >> >>>>>>> might be situations where this is desirable. I feel we can discuss > >> >>>>>>> on a case by case basis whenever a future role is added. > >> >>>>>>> > >> >>>>>>> > Persistent - roles should not be lost across reboot > >> >>>>>>> > >> >>>>>>> Agree. > >> >>>>>>> > >> >>>>>>> > Immutable - roles should not change while the node is running > >> >>>>>>> > >> >>>>>>> Agree > >> >>>>>>> > >> >>>>>>> > Lively - A node with a capability may not be presently > >> >>>>>>> > providing that capability. > >> >>>>>>> > >> >>>>>>> I don't understand, can you please elaborate? > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> Specifically imagine the case where there are 100 nodes: > >> >>>>>> 1-100 ==> DATA > >> >>>>>> 101-103 ==> OVERSEER > >> >>>>>> 104-106 ==> ZOOKEEPER > >> >>>>>> > >> >>>>>> But you won't have 3 overseers... you'll want only one of those to > >> >>>>>> be providing overseer functionality and the other two to be > >> >>>>>> capable, but not providing (so that if the current overseer goes > >> >>>>>> down a new one can be assigned). > >> >>>>>> > >> >>>>>> Then you decide you'd ike 5 Zookeepers. You start nodes 107-108 > >> >>>>>> with that role, but you probably want to ensure that zookeepers > >> >>>>>> require some sort of command for them to actually join the > >> >>>>>> zookeeper cluster (i.e. /admin?action=ZKADD&nodes=node107,node18) > >> >>>>>> ... to do that the nodes need to be up. But oh look I typoed 108... > >> >>>>>> we want that to fail... how? because 18 does not have the > >> >>>>>> capability to become a zookeeper. > >> >>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan Chattopadhyaya > >> >>>>>>> <ichattopadhy...@gmail.com> wrote: > >> >>>>>>>> > >> >>>>>>>> > Ilan: A node not having node.roles defined should be assumed to > >> >>>>>>>> > have all roles. Not only data. I don't see a reason to special > >> >>>>>>>> > case this one or any role. > >> >>>>>>>> > Gus: There should be no "assumptions" Nothing to figure out. A > >> >>>>>>>> > node has a role or not. For back compatibility reasons, all > >> >>>>>>>> > roles would be assumed on startup if none specified. > >> >>>>>>>> > Jan: No role == all roles. Explicit list of roles = exactly > >> >>>>>>>> > those roles. > >> >>>>>>>> > >> >>>>>>>> Problem with this approach is mainly to do with backcompat. > >> >>>>>>>> > >> >>>>>>>> 1. Overseer backcompat: > >> >>>>>>>> If we don't make any modifications to how overseer works and > >> >>>>>>>> adopt this approach (as quoted), then imagine this situation: > >> >>>>>>>> > >> >>>>>>>> Solr1-100: No roles param (assumed to be "data,overseer"). > >> >>>>>>>> Solr101: -Dnode.roles=overseer (intention: dedicated overseer) > >> >>>>>>>> > >> >>>>>>>> User wants this node Solr101 to be a dedicated overseer, but for > >> >>>>>>>> that to happen, he/she would need to restart all the data nodes > >> >>>>>>>> with -Dnode.roles=data. This will cause unnecessary disruption to > >> >>>>>>>> running clusters where a dedicated overseer is needed. Keep in > >> >>>>>>>> mind, if a user needs a dedicated overseer, he's likely in an > >> >>>>>>>> emergency situation and restarting the whole cluster might not be > >> >>>>>>>> viable for him/her. > >> >>>>>>>> > >> >>>>>>>> 2. Future roles might not be compatible with this "assumed to > >> >>>>>>>> have all roles" idea: > >> >>>>>>>> Take the proposed "zookeeper" role for example. Today, regular > >> >>>>>>>> nodes are not supposed to have embedded ZK running on them. By > >> >>>>>>>> introducing this artificial limitation ("assumed to have all > >> >>>>>>>> roles"), we constrain adoption of all future roles to necessarily > >> >>>>>>>> require a full cluster restart. > >> >>>>>>>> > >> >>>>>>>> Keep in mind newer Solr versions can introduce new capabilities > >> >>>>>>>> and roles. Imagine we have a role that is defined in a new Solr > >> >>>>>>>> version (and there's functionality to go with that role), and > >> >>>>>>>> user upgrades to that version. However, his/her nodes all were > >> >>>>>>>> started with no node.roles param. Hence, if those nodes are > >> >>>>>>>> "assumed to have all roles", then just by virtue of upgrading to > >> >>>>>>>> this new version, new capabilities will be turned on for the > >> >>>>>>>> entire cluster, whether or not the user opted for such a > >> >>>>>>>> capability. This is totally undesirable. > >> >>>>>>>> > >> >>>>>>>> > Gus: I actually don't want a coordinator to do more work, I > >> >>>>>>>> > would prefer small focused roles with names that accurately > >> >>>>>>>> > describe their function. In that light, COORDINATOR might be > >> >>>>>>>> > too nebulous. How about AGREGATOR role? (what I was thinking of > >> >>>>>>>> > would better be called a QUERY_ANALYSIS role) > >> >>>>>>>> > >> >>>>>>>> If you want to do specific things like query analysis or query > >> >>>>>>>> aggregation or bulk indexing etc, all of those can be done on > >> >>>>>>>> COORDINATOR nodes (as is the case in ElasticSearch). Having tens > >> >>>>>>>> of of " small focused roles" defined as first class concepts > >> >>>>>>>> would be confusing to the user. As a remedy to your situation > >> >>>>>>>> where you want the coordinator role to also do query-analysis for > >> >>>>>>>> shards, one possible solution is to send such a query to a > >> >>>>>>>> coordinator node with a parameter like > >> >>>>>>>> "coordinator.query_analysis=true", and then the coordinator, > >> >>>>>>>> instead of blindly hitting remote shards, also does some extra > >> >>>>>>>> work on behalf of the shards. > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan Chattopadhyaya > >> >>>>>>>> <ichattopadhy...@gmail.com> wrote: > >> >>>>>>>>> > >> >>>>>>>>> > If we make collections role-aware for example (replicas of > >> >>>>>>>>> > that collection can only be > >> >>>>>>>>> > placed on nodes with a specific role, in addition to the other > >> >>>>>>>>> > role based constraints), > >> >>>>>>>>> > the set of roles should be user extensible and not fixed. > >> >>>>>>>>> > If collections are not role aware, the constraints introduced > >> >>>>>>>>> > by roles apply to all collections > >> >>>>>>>>> > equally which might be insufficient if a user needs for > >> >>>>>>>>> > example a heavily used collection to > >> >>>>>>>>> > only be placed on more powerful nodes. > >> >>>>>>>>> > >> >>>>>>>>> I feel node roles and role-aware collections are orthogonal > >> >>>>>>>>> topics. What you describe above can be achieved by the > >> >>>>>>>>> autoscaling+replica placement framework where the placement > >> >>>>>>>>> plugins take the node roles as one of the inputs. > >> >>>>>>>>> > >> >>>>>>>>> > It does impact the design from early on: the set of roles need > >> >>>>>>>>> > to be expandable by a user > >> >>>>>>>>> > by creating a collection with new roles for example (consumed > >> >>>>>>>>> > by placement plugins) and be > >> >>>>>>>>> > able to start nodes with new (arbitrary) roles. Should such > >> >>>>>>>>> > roles follow some naming syntax to > >> >>>>>>>>> > differentiate them from built in roles? To be able to fail on > >> >>>>>>>>> > typos on roles - that otherwise can be > >> >>>>>>>>> > crippling and hard to debug. This implies in any case that the > >> >>>>>>>>> > current design can't assume all > >> >>>>>>>>> > roles are known at compile time or define them in a Java enum. > >> >>>>>>>>> > >> >>>>>>>>> I think this should be achieved by something different from > >> >>>>>>>>> roles. Something like node labels (user defined) which can then > >> >>>>>>>>> be used in a replica placement plugin to assign replicas. I see > >> >>>>>>>>> roles as more closely associated with kinds of functionality a > >> >>>>>>>>> node is designated for. Therefore, I feel that replica > >> >>>>>>>>> placements and user defined node labels is out of scope for this > >> >>>>>>>>> SIP. It can be added later in a separate SIP, without being at > >> >>>>>>>>> odds with this proposal. > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl > >> >>>>>>>>> <jan....@cominvent.com> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg > >> >>>>>>>>>> > <ilans...@gmail.com>: > >> >>>>>>>>>> > A node not having node.roles defined should be assumed to > >> >>>>>>>>>> > have all roles. Not only data. I don't see a reason to > >> >>>>>>>>>> > special case this one or any role. > >> >>>>>>>>>> > >> >>>>>>>>>> +1, make it simple and transparent. No role == all roles. > >> >>>>>>>>>> Explicit list of roles = exactly those roles. > >> >>>>>>>>>> > >> >>>>>>>>>> > (Gus) See my comment above, but maybe preference is something > >> >>>>>>>>>> > handled as a feature of the role rather than via role > >> >>>>>>>>>> > designation? > >> >>>>>>>>>> > >> >>>>>>>>>> Yea, we always need an overseer, so that feature can decide to > >> >>>>>>>>>> use its list of nodes as a preference if it so chooses. > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> Aside: I think it makes it easier if we always prefix Solr > >> >>>>>>>>>> env.vars and sys.props with "SOLR_" or "solr.", i.e. > >> >>>>>>>>>> -Dsolr.node.roles=foo. That way we can get away from having to > >> >>>>>>>>>> have explicit code in bin/solr, bin/solr.cmd and SolrCLI to > >> >>>>>>>>>> manage every single property. Instead we can parse all ENVs and > >> >>>>>>>>>> Props with the solr prefix in our bootstrap code. And we can by > >> >>>>>>>>>> convention allow e.g. docker run -e SOLR_NODE_ROLES=foo solr:9 > >> >>>>>>>>>> and it would be the same ting... > >> >>>>>>>>>> > >> >>>>>>>>>> Jan > >> >>>>>>>>>> --------------------------------------------------------------------- > >> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > >> >>>>>>>>>> For additional commands, e-mail: dev-h...@solr.apache.org > >> >>>>>>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> -- > >> >>>>>> http://www.needhamsoftware.com (work) > >> >>>>>> http://www.the111shift.com (play) > >> >>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> http://www.needhamsoftware.com (work) > >> >>>> http://www.the111shift.com (play) > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > >> For additional commands, e-mail: dev-h...@solr.apache.org > >> > > > > > > -- > > http://www.needhamsoftware.com (work) > > http://www.the111shift.com (play) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > For additional commands, e-mail: dev-h...@solr.apache.org > -- ----------------------------------------------------- Noble Paul --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org