The SIP can be boiled down to the following * *Tag a node with a label (role) using a system property* ** Use the placement plugin to whitelist/block list certain nodes* ** Publish the roles through an API*
That's it If you wish to add a new role, use the same concept. Period On Fri, Nov 5, 2021, 7:00 AM Noble Paul <noble.p...@gmail.com> wrote: > Yes Ilan > The coordinator is the first compelling usecase. The roles is the UX and > it's a very simple piece. The real work is coming as a separate PR. > > Roles can be achieved in a clumsy way today. It's unintuitive and we don't > want to make the user to jump through the hoops. > > I'll open a PR and you be the judge on the simplicity of this SIP. It's > not going to have any major impact on any component of Solr. > > > > On Fri, Nov 5, 2021, 2:01 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > >> I was noting that the real value of the proposal (real value = being able >> to do things that are currently impossible with Solr) was due to an >> independent concept of a coordinator "core", and that if we had this >> (currently does not exist in Solr but apparently you do have it on a fork), >> we can achieve most/all of what the SIP proposes with existing means, i.e. >> without roles. Maybe in a less flexible/user friendly way, maybe not (given >> the details of rolling out roles are still fuzzy). >> And if we don't have the concept of coordinator core, then the roles by >> themselves do not allow much more than what is already achievable by other >> means. >> >> Ilan >> >> On Thu, Nov 4, 2021 at 12:02 PM Noble Paul <noble.p...@gmail.com> wrote: >> >>> The placement part of roles feature may use placement plugin API . >>> >>> >>> The implementation is not what we're discussing here. We need a >>> consistent story for the user when it comes to roles. This discussion is >>> about the UX rather than the impl. >>> >>> Most of our discussions are about how we should implement it >>> >>> >>> >>> On Thu, Nov 4, 2021, 9:27 PM Ilan Ginzburg <ilans...@gmail.com> wrote: >>> >>>> A lot of the value of this SIP relies on the pseudo-core thing (because >>>> placing on specific nodes is achievable today, Overseer role already >>>> exists). Roles as described without the coordinator concept are just >>>> another way to do things already possible today (with a very minor update >>>> on the Affinity placement plugin - it might even support it right away >>>> actually, didn't check). >>>> Maybe "pseudo core" should go in first and condition the rest of the >>>> work? It feels like a bigger chunk with more challenging integration issues >>>> (routing, new concept in the collection/shard/replica hierarchy). >>>> >>>> Ilan >>>> >>>> On Thu, Nov 4, 2021 at 11:20 AM Noble Paul <noble.p...@gmail.com> >>>> wrote: >>>> >>>>> None of the design is dictated by the version in which we implement >>>>> this. The SIP is mostly about the "what", "why" and the UX >>>>> >>>>> I don't have any affinity to any particular version. This is >>>>> definitely going to happen in 9.x. Even if it is built in 9.x we will have >>>>> to build and support all versions of solr we use internally. When we >>>>> eventually upgrade from our current version to a 9.x version , it has to >>>>> be >>>>> backward compatible.The choice of whether this is available for public >>>>> consumption as a branch/release is up for debate >>>>> >>>>> On Thu, Nov 4, 2021, 8:28 PM Jan Høydahl <jan....@cominvent.com> >>>>> wrote: >>>>> >>>>>> Let's do ourself a service and target 9.0 for roles. It's too late to >>>>>> plan new features into 8.x. >>>>>> >>>>>> I don't understand the urgency either. I can get that certain Solr >>>>>> users would wish for such a feature "yesterday" but that cannot drive our >>>>>> decisions on what version to target for features. When targeting 9.0, all >>>>>> upgrade or back-compat worries will need to be baked into the feature >>>>>> itself, so that there is either code support or good documentation for >>>>>> how >>>>>> to start using roles after upgrading a cluster to 9.0. Perhaps there must >>>>>> be a temporary cluster-property in 9.0 "enableRoles=false" that can be >>>>>> set, >>>>>> even if all 9.0 nodes are given roles on startup. Then, initially after >>>>>> the >>>>>> upgrade, the cluster behaves as it did in 8.x. Then once you are ready to >>>>>> enforce roles, you can flip the cluster property, and placement and >>>>>> routing >>>>>> starts using roles. In 10.0 that property can then go away. >>>>>> >>>>>> When it comes to placement plugins, we can document in that they MUST >>>>>> respect certain node roles (at least the data role), and treat it as a >>>>>> bug >>>>>> if they don't. >>>>>> >>>>>> Jan >>>>>> >>>>>> 4. nov. 2021 kl. 03:36 skrev Noble Paul <noble.p...@gmail.com>: >>>>>> >>>>>> Thanks everyone for participating in the discussion. I have gone >>>>>> through all your valuable inputs and these are my suggestions >>>>>> >>>>>> Requirements? >>>>>> >>>>>> 1. Users should be able to designate a node with some role by >>>>>> starting (say -Dnode.roles=coordinator) >>>>>> 2. This node should be able to perform a certain behavior >>>>>> 3. Replica placement should be aware of this and may choose to >>>>>> place or not place a replica in this node >>>>>> 4. Any client should be able to query any node in the cluster to >>>>>> get a list of nodes with a specified role or get the roles of a given >>>>>> node >>>>>> >>>>>> >>>>>> Implementation? >>>>>> Here is how we could implement each of the requirements: >>>>>> >>>>>> 1. We could theoretically use a well known system property and >>>>>> 2. The actual behavior will have to be implemented in both 8.x or >>>>>> 9.x >>>>>> 3. Placement of replicas >>>>>> 1. It’s not possible to do this in 8.x >>>>>> 2. In 9.x, replica placement plugin can be internally used to >>>>>> ensure proper placement of replicas in the roles feature. >>>>>> >>>>>> 1. It can’t be done with the current design as users cannot >>>>>> chain multiple placement plugins or user has to build a custom >>>>>> placement >>>>>> plugin of his own >>>>>> 2. There is no standard UX to achieve this. It will be a >>>>>> recipe (start nodes with this property and create these rules >>>>>> etc, etc). >>>>>> This is awkward & error prone, as compared to saying “start a >>>>>> node with >>>>>> coordinator role” and Solr will take care of it. >>>>>> 4. There will be a new API endpoint to publish this >>>>>> information in 8.x and 9.x. This end point is important to make this >>>>>> feature usable >>>>>> >>>>>> >>>>>> Conclusion >>>>>> >>>>>> 1. With a roles feature, we can achieve the objectives in a user >>>>>> friendly and intuitive way >>>>>> 2. The user interface can be consistent across 8.x and 9.x even >>>>>> though 9.x can use the placement plugin internally >>>>>> 3. The actual roles definition will be same across 8.x and 9.x >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Nov 4, 2021 at 6:32 AM Noble Paul <noble.p...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Michael >>>>>>> >>>>>>> We explored all options to before arriving at this solution. Ishan >>>>>>> has already explained why Tim's suggestions have their shortcomings >>>>>>> when it >>>>>>> comes to user experience. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Nov 4, 2021, 3:51 AM Michael Gibney < >>>>>>> mich...@michaelgibney.net> wrote: >>>>>>> >>>>>>>> >I actually didn't realize that an empty Solr node would forward >>>>>>>> the top-level >>>>>>>> >request onward instead of just being the query controller itself? >>>>>>>> That >>>>>>>> >actually seems like a bug vs. a feature, IMO any node that receives >>>>>>>> >the top-level query should just be the coordinator, what stops it? >>>>>>>> >>>>>>>> +1 to Tim's statement quoted above; unless I'm missing something, >>>>>>>> this feels like an issue that should be addressed regardless of this >>>>>>>> SIP. >>>>>>>> (perhaps it would be addressed incidentally by this SIP? -- in any >>>>>>>> event >>>>>>>> the current situation seems to not make sense. As Tim points out, the >>>>>>>> relevant configs should in principle be accessible from ZK whether or >>>>>>>> not >>>>>>>> there's a core for a given collection on a given node). >>>>>>>> >>>>>>>> Considering the above, and especially given Ishan that you say "The >>>>>>>> coordinator role is the biggest motivation for introducing the concept >>>>>>>> of >>>>>>>> roles", while reading the SIP I found myself wishing for a fuller >>>>>>>> enumeration of use cases, and a more sympathetic characterization of >>>>>>>> alternatives (existing alternatives, and perhaps, as with the above >>>>>>>> "proxy >>>>>>>> request" issue, simpler-but-not-yet-implemented alternatives). >>>>>>>> >>>>>>>> Combining questions about use cases with questions about >>>>>>>> alternatives: assuming that 9.x autoscaling can indeed be reliably >>>>>>>> used to >>>>>>>> stop replicas from being placed on nodes, how close would addressing >>>>>>>> the >>>>>>>> orthogonal "proxy request" issue come to addressing potential use >>>>>>>> cases? >>>>>>>> >>>>>>>> Michael >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Nov 3, 2021 at 10:00 AM Ilan Ginzburg <ilans...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I think if we have the new "pseudo core" abstraction (I like it! >>>>>>>>> Will it really be a core with an index on disk or some new >>>>>>>>> abstraction only >>>>>>>>> tracked in ZK and in memory?) to play the role of coordinator, then >>>>>>>>> we have >>>>>>>>> all we need with the affinity placement plugin framework for a data >>>>>>>>> free >>>>>>>>> coordinator node implementation. >>>>>>>>> It is easy to use system properties to exclude nodes from >>>>>>>>> receiving replicas using the placement plugins, a minor change in the >>>>>>>>> Affinity Placement Plugin. Such nodes will not receive any replicas >>>>>>>>> by the >>>>>>>>> placement plugin not even at startup (the system property will be >>>>>>>>> assigned >>>>>>>>> at startup so no manual intervention needed). >>>>>>>>> >>>>>>>>> It will not work if switching to another placement plugin, unless >>>>>>>>> that other plugin reimplements that (simple) aspect. Is that an issue? >>>>>>>>> >>>>>>>>> Ilan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Nov 3, 2021 at 2:57 AM Ishan Chattopadhyaya < >>>>>>>>> ichattopadhy...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Answers inline below. >>>>>>>>>> >>>>>>>>>> On Wed, Nov 3, 2021 at 5:56 AM Timothy Potter < >>>>>>>>>> thelabd...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> One last thought on this for me ... I think it would be >>>>>>>>>>> beneficial for >>>>>>>>>>> the SIP to address how this new feature will work with the >>>>>>>>>>> existing >>>>>>>>>>> shards.preference solution and affinity based placement plugin. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I was more inclined to keep this SIP focused on broad concept of >>>>>>>>>> roles, and any upcoming roles (coordinator role, along with that >>>>>>>>>> pseudo-core functionality) to be described in their own issue (e.g. >>>>>>>>>> SOLR-15715). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Moreover, your pseudo-replica solution sounds like a new replica >>>>>>>>>>> type >>>>>>>>>>> vs. a node level thing. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I misspoke when I called it "pseudo replica", it is actually a >>>>>>>>>> "pseudo core". Replicas are shard level concepts, but such a pseudo >>>>>>>>>> core >>>>>>>>>> that we plan to introduce will pertain to one or more collections. >>>>>>>>>> Imagine >>>>>>>>>> collection1 has shard1 and shard2, there will be a single pseudo >>>>>>>>>> core for >>>>>>>>>> collection1 (we haven't decided on the prefix of this pseudo core >>>>>>>>>> yet, but >>>>>>>>>> a candidate can be ".collection1_coordinator"). Replica type won't >>>>>>>>>> fit this >>>>>>>>>> mental model here. We can discuss this more in the SOLR-15715 issue. >>>>>>>>>> >>>>>>>>>> The placement strategy can place replicas >>>>>>>>>>> based on replica type and node type (just a system property), so >>>>>>>>>>> please address why you can't achieve a query coordinator >>>>>>>>>>> behavior with >>>>>>>>>>> a new replica type + improvements to the Affinity placement >>>>>>>>>>> plugin? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> To put down my thoughts on why Affinity placement plugin won't >>>>>>>>>> work for the purpose of ensuring that we have nodes that host no >>>>>>>>>> data on it: >>>>>>>>>> 1. We want the ability to have nodes with no data on it as a >>>>>>>>>> first class concept for users. Hence, if the Affinity placement >>>>>>>>>> plugin is >>>>>>>>>> used for that purpose, users won't be able to switch out that plugin >>>>>>>>>> and >>>>>>>>>> use anything of their own. Currently, IIUC, there's not way for >>>>>>>>>> users to >>>>>>>>>> use multiple placement plugins. >>>>>>>>>> 2. Nodes that shouldn't host any replica on it are generally >>>>>>>>>> ephemeral in nature; many of them may join the cluster, they may go >>>>>>>>>> away. >>>>>>>>>> If such a node joins the cluster, they immediately become eligible >>>>>>>>>> for >>>>>>>>>> replica placement, before even the sysadmin is able to assign an >>>>>>>>>> affinity >>>>>>>>>> placement configuration for that node. This is a problem. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Tim >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for your thoughts and feedback, I think it will help us >>>>>>>>>> put together the document with more insights into our design choices. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Ishan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Nov 2, 2021 at 6:14 PM Ishan Chattopadhyaya >>>>>>>>>>> <ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> > >>>>>>>>>>> > Also, in a cluster where new collections/shards/replicas are >>>>>>>>>>> continuously added all the time, it would be pretty awkward to >>>>>>>>>>> start a node >>>>>>>>>>> (in regular mode), briefly have it become eligible for replica >>>>>>>>>>> assignment, >>>>>>>>>>> then invoking a replica placement rule/autoscaling policy for that >>>>>>>>>>> node to >>>>>>>>>>> not place replicas on it. Instead, starting a node with a defined >>>>>>>>>>> role (as >>>>>>>>>>> a startup param) precludes that brief period of eligibility for >>>>>>>>>>> replica >>>>>>>>>>> placement on such a node. >>>>>>>>>>> > >>>>>>>>>>> > On Wed, Nov 3, 2021 at 5:39 AM Ishan Chattopadhyaya < >>>>>>>>>>> ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >> If we were to tell users how to do "scatter gather on an >>>>>>>>>>> empty node", *how exactly* would you recommend users have an empty >>>>>>>>>>> node to >>>>>>>>>>> begin with? Wouldn't you say something like "for 8x you can do this >>>>>>>>>>> (rule >>>>>>>>>>> based replica placement) or do that (autoscaling), but for 9x you >>>>>>>>>>> do this >>>>>>>>>>> new thing". Having a node that doesn't have a data role seems like a >>>>>>>>>>> consistent and an elegant way for users to invoke such a >>>>>>>>>>> functionality and >>>>>>>>>>> also easily relate to a broad concept, without having to deal with >>>>>>>>>>> autoscaling frameworks of the ancient past, medieval past or the >>>>>>>>>>> future. >>>>>>>>>>> >> >>>>>>>>>>> >> On Wed, Nov 3, 2021 at 5:29 AM Timothy Potter < >>>>>>>>>>> thelabd...@gmail.com> wrote: >>>>>>>>>>> >>> >>>>>>>>>>> >>> As opposed to what? Looking up the configset for the >>>>>>>>>>> addressed >>>>>>>>>>> >>> collection and pulling whatever information it needs from >>>>>>>>>>> cached data. >>>>>>>>>>> >>> I'm sure there are some nuances but I hardly think you need >>>>>>>>>>> a node >>>>>>>>>>> >>> role framework to deal with determine the unique key field >>>>>>>>>>> to do >>>>>>>>>>> >>> scatter gather on an empty node when you have easy access to >>>>>>>>>>> >>> collection metadata. >>>>>>>>>>> >>> >>>>>>>>>>> >>> Doesn't seem like a hard thing to overcome to me. >>>>>>>>>>> >>> >>>>>>>>>>> >>> On Tue, Nov 2, 2021 at 5:49 PM Noble Paul < >>>>>>>>>>> noble.p...@gmail.com> wrote: >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > On Wed, Nov 3, 2021, 10:46 AM Timothy Potter < >>>>>>>>>>> thelabd...@gmail.com> wrote: >>>>>>>>>>> >>> >> >>>>>>>>>>> >>> >> I'm not missing the point of the query coordinator, but I >>>>>>>>>>> actually >>>>>>>>>>> >>> >> didn't realize that an empty Solr node would forward the >>>>>>>>>>> top-level >>>>>>>>>>> >>> >> request onward instead of just being the query controller >>>>>>>>>>> itself? That >>>>>>>>>>> >>> >> actually seems like a bug vs. a feature, IMO any node >>>>>>>>>>> that receives >>>>>>>>>>> >>> >> the top-level query should just be the coordinator, what >>>>>>>>>>> stops it? >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > >>>>>>>>>>> >>> > To process a request there should be a core that uses the >>>>>>>>>>> same configset as the requested collection. >>>>>>>>>>> >>> >> >>>>>>>>>>> >>> >> >>>>>>>>>>> >>> >> Anyway, it sounds to me like you guys have your minds >>>>>>>>>>> made up >>>>>>>>>>> >>> >> regardless of feedback. >>>>>>>>>>> >>> >> >>>>>>>>>>> >>> >> Btw ~ I only mentioned the Zookeeper part b/c it's in >>>>>>>>>>> your SIP as a >>>>>>>>>>> >>> >> specific role, not sure why you took that as me wanting >>>>>>>>>>> to discuss the >>>>>>>>>>> >>> >> embedded ZK in your SIP? >>>>>>>>>>> >>> >> >>>>>>>>>>> >>> >> On Tue, Nov 2, 2021 at 5:13 PM Ishan Chattopadhyaya >>>>>>>>>>> >>> >> <ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > Hi Tim, >>>>>>>>>>> >>> >> > Here are my responses inline. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > On Wed, Nov 3, 2021 at 3:22 AM Timothy Potter < >>>>>>>>>>> thelabd...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> I'm just not convinced this feature is even needed and >>>>>>>>>>> the SIP is not >>>>>>>>>>> >>> >> >> convincing that "There is no proper alternative today." >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > There are no proper alternatives today, just hacks. On >>>>>>>>>>> 8x, we have two different deprecated frameworks to stop nodes from >>>>>>>>>>> being >>>>>>>>>>> placed on a node (1. rule based replica placement, 2. autoscaling >>>>>>>>>>> framework). On 9x, we have a new autoscaling framework, which I >>>>>>>>>>> don't even >>>>>>>>>>> think is fully implemented. And, there's definitely no way to have >>>>>>>>>>> a node >>>>>>>>>>> act as a query coordinator without having data on it. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> 1) Just b/c Elastic and Vespa have a concept of node >>>>>>>>>>> roles, doesn't >>>>>>>>>>> >>> >> >> mean Solr needs this. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > Solr needs this. Elastic has such concepts is a >>>>>>>>>>> coincidence, and also means we have an opportunity to catch up with >>>>>>>>>>> them; >>>>>>>>>>> they have these concepts for a reason. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> Also, some of Elastic's roles overlap with >>>>>>>>>>> >>> >> >> concepts Solr already has in a different form, i.e >>>>>>>>>>> data_hot sounds >>>>>>>>>>> >>> >> >> like NRT and data_warm sounds a lot like our Pull >>>>>>>>>>> Replica Type >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > I think that is beyond the scope of this SIP. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> 2) You can achieve the "coordinator" role with >>>>>>>>>>> auto-scaling rules >>>>>>>>>>> >>> >> >> pre-9.x and with the AffinityPlacementPlugin (heck, it >>>>>>>>>>> even has a node >>>>>>>>>>> >>> >> >> type built in: >>>>>>>>>>> .requestNodeSystemProperty(AffinityPlacementConfig.NODE_TYPE_SYSPROP). >>>>>>>>>>> >>> >> >> Simply build your replica placement rules such that no >>>>>>>>>>> replicas land >>>>>>>>>>> >>> >> >> on "coordinator" nodes. And you can route queries >>>>>>>>>>> using node.sysprop >>>>>>>>>>> >>> >> >> already using shards.preference. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > I think you missed the whole point of the query >>>>>>>>>>> coordinator. Please refer to this >>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715. >>>>>>>>>>> >>> >> > Let me summarize the main difference between what (I >>>>>>>>>>> think) you refer to and what is proposed in SOLR-15715. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > With your suggestion, we'll have a node that doesn't >>>>>>>>>>> host any replicas. And you suggest queries landing on such nodes be >>>>>>>>>>> routed >>>>>>>>>>> using shards.preference? Well, in such a case, these queries will be >>>>>>>>>>> forwarded/proxied to a random node hosting a replica of the >>>>>>>>>>> collection and >>>>>>>>>>> that node then acts as the coordinator. This situation is no better >>>>>>>>>>> than >>>>>>>>>>> sending the query directly to that particular node. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > What is proposed in SOLR-15715 is a query aggregation >>>>>>>>>>> functionality. There will be pseudo replicas (aware of the >>>>>>>>>>> configset) on >>>>>>>>>>> this coordinator node that handle the request themselves, sends >>>>>>>>>>> shard >>>>>>>>>>> requests to data hosting replicas, collects responses and merges >>>>>>>>>>> them, and >>>>>>>>>>> sends back to the user. This merge step is usually extremely memory >>>>>>>>>>> intensive, and it would be good to serve these off stateless nodes >>>>>>>>>>> (that >>>>>>>>>>> host no data). >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> 3) Dedicated overseer role? I thought we were removing >>>>>>>>>>> the overseer?!? >>>>>>>>>>> >>> >> >> Also, we already have the ability to run the overseer >>>>>>>>>>> on specific >>>>>>>>>>> >>> >> >> nodes w/o a new framework, so this doesn't really >>>>>>>>>>> convince me we need >>>>>>>>>>> >>> >> >> a new framework. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > There's absolutely no change proposed to the "overseer" >>>>>>>>>>> role. What users need on production clusters are nodes dedicated for >>>>>>>>>>> overseer operations, and for that the current "overseer" role >>>>>>>>>>> suffices, >>>>>>>>>>> together with some functionality to not place replicas on such >>>>>>>>>>> nodes. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> 4) We will indeed need to decide which nodes host >>>>>>>>>>> embedded Zookeeper's >>>>>>>>>>> >>> >> >> but I'd argue that solution hasn't been designed >>>>>>>>>>> entirely and we >>>>>>>>>>> >>> >> >> probably don't need a formal node role framework to >>>>>>>>>>> determine which >>>>>>>>>>> >>> >> >> nodes host embedded ZKs. Moreover, embedded ZK seems >>>>>>>>>>> more like a small >>>>>>>>>>> >>> >> >> cluster thing and anyone running a large cluster will >>>>>>>>>>> probably have a >>>>>>>>>>> >>> >> >> dedicated ZK ensemble as they do today. The node role >>>>>>>>>>> thing seems like >>>>>>>>>>> >>> >> >> it's intended for large clusters and my gut says few >>>>>>>>>>> will use embedded >>>>>>>>>>> >>> >> >> ZK for large clusters. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > This SIP is not the right place for this discussion. >>>>>>>>>>> There's a separate SIP for this. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> 5) You can also achieve a lot of "node role" >>>>>>>>>>> functionality in query >>>>>>>>>>> >>> >> >> routing using the shards.preference parameter. >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > That doesn't solve the purpose behind >>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> At the very least, the SIP needs to list specific use >>>>>>>>>>> cases that >>>>>>>>>>> >>> >> >> require this feature that are not achievable with the >>>>>>>>>>> current features >>>>>>>>>>> >>> >> >> before getting bogged down in the impl. details. >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> > The coordinator role is the biggest motivation for >>>>>>>>>>> introducing the concept of roles. However, in addition to what is >>>>>>>>>>> proposed >>>>>>>>>>> in SOLR-15715, a coordinator node can later on also be used as a >>>>>>>>>>> node for >>>>>>>>>>> users to run streaming expressions on, do bulk indexing on (impl >>>>>>>>>>> details >>>>>>>>>>> for this to come later, don't want distraction here). >>>>>>>>>>> >>> >> > >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> Tim >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> On Tue, Nov 2, 2021 at 3:20 PM Gus Heck < >>>>>>>>>>> gus.h...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> > >>>>>>>>>>> >>> >> >> > I think there are things not yet accounted for. Time >>>>>>>>>>> I spent yesterday is biting me today. Pls give a couple days. >>>>>>>>>>> >>> >> >> > >>>>>>>>>>> >>> >> >> > On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski < >>>>>>>>>>> gerlowsk...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> Hey Ishan, >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> I appreciate you writing up the SIP! Here's some >>>>>>>>>>> notes/questions I >>>>>>>>>>> >>> >> >> >> had as I was reading through your writeup and this >>>>>>>>>>> mail thread. >>>>>>>>>>> >>> >> >> >> ("----" separators between thoughts, hopefully that >>>>>>>>>>> helps.) >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> ---- >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> I'll add my vote to what Jan, Gus, Ilan, and >>>>>>>>>>> Houston already >>>>>>>>>>> >>> >> >> >> suggested: roles should default to "all-on". I see >>>>>>>>>>> the downsides >>>>>>>>>>> >>> >> >> >> you're worried about with that approach (esp. >>>>>>>>>>> around 'overseer'), but >>>>>>>>>>> >>> >> >> >> they may be mitigatable, at least in part. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> > [mail thread] User wants this node Solr101 to be >>>>>>>>>>> a dedicated overseer, but for that to happen, he/she would need to >>>>>>>>>>> restart >>>>>>>>>>> all the data nodes with -Dnode.roles=data >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> Sure, if roles can only be specified at startup. >>>>>>>>>>> But that may be a >>>>>>>>>>> >>> >> >> >> self-imposed constraint. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> An API to change a node's roles would remove the >>>>>>>>>>> need for a restart >>>>>>>>>>> >>> >> >> >> and make it easy for users to affect the semantics >>>>>>>>>>> they want. You >>>>>>>>>>> >>> >> >> >> decided you want a dedicated overseer N nodes into >>>>>>>>>>> your cluster >>>>>>>>>>> >>> >> >> >> deployment? Deploy node 'N' with the 'overseer', >>>>>>>>>>> and toggle the >>>>>>>>>>> >>> >> >> >> overseer role off on the remainder. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> Now, I understand that you don't want roles to >>>>>>>>>>> change at runtime, but >>>>>>>>>>> >>> >> >> >> I haven't seen you get much into "why", beyond >>>>>>>>>>> saying "it is very >>>>>>>>>>> >>> >> >> >> risky to have nodes change roles while they are up >>>>>>>>>>> and running." Can >>>>>>>>>>> >>> >> >> >> you expand a bit on the risks you're worried >>>>>>>>>>> about? If you're >>>>>>>>>>> >>> >> >> >> explicit about them here maybe someone can think of >>>>>>>>>>> a clever way to >>>>>>>>>>> >>> >> >> >> address them? >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> > Hence, if those nodes are "assumed to have all >>>>>>>>>>> roles", then just by virtue of upgrading to this new version, new >>>>>>>>>>> capabilities will be turned on for the entire cluster, whether or >>>>>>>>>>> not the >>>>>>>>>>> user opted for such a capability. This is totally undesirable. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> Obviously "roles" refer to much bigger chunks of >>>>>>>>>>> functionality than >>>>>>>>>>> >>> >> >> >> usual, so in a sense defaulting roles on is >>>>>>>>>>> scarier. But in a sense >>>>>>>>>>> >>> >> >> >> you're describing something that's an inherent part >>>>>>>>>>> of software >>>>>>>>>>> >>> >> >> >> releases. Releases expose new features that are >>>>>>>>>>> typically on by >>>>>>>>>>> >>> >> >> >> default. A new default-on role in 9.1 might hurt a >>>>>>>>>>> user, but there's >>>>>>>>>>> >>> >> >> >> no fundamental difference between that and a change >>>>>>>>>>> to backups or >>>>>>>>>>> >>> >> >> >> replication or whatever in the same release. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> I don't mean to belittle the difference in scope - >>>>>>>>>>> I get your concern. >>>>>>>>>>> >>> >> >> >> But IMO this is something to address with good >>>>>>>>>>> release notes and >>>>>>>>>>> >>> >> >> >> documentation. Designing for admins who don't do >>>>>>>>>>> even cursory >>>>>>>>>>> >>> >> >> >> research before an upgrade ties both our hands >>>>>>>>>>> behind our back as a >>>>>>>>>>> >>> >> >> >> project. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> ---- >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> > [SIP] Internal representation in ZK ... >>>>>>>>>>> Implementation details like these can be fleshed out in the PR >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> IMO this is important enough to flush out as part >>>>>>>>>>> of the SIP, at least >>>>>>>>>>> >>> >> >> >> in broad strokes. It affects backcompat, SolrJ >>>>>>>>>>> client design, etc. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> ---- >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> > [SIP] GET /api/cluster/roles?node=node1 >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> Woohoo - way to include a v2 API definition! >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> AFAIR, the v2 API has a /nodes path defined - I >>>>>>>>>>> wonder whether "GET >>>>>>>>>>> >>> >> >> >> /nodes/someNode/roles" wouldn't be a more intuitive >>>>>>>>>>> endpoint for the >>>>>>>>>>> >>> >> >> >> "get the roles this node has" functionality. >>>>>>>>>>> Though I leave that for >>>>>>>>>>> >>> >> >> >> your consideration. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> ---- >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> Looking forward to your responses and seeing the >>>>>>>>>>> SIP progress! It's a >>>>>>>>>>> >>> >> >> >> really cool, promising idea IMO. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> Best, >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> Jason >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya >>>>>>>>>>> >>> >> >> >> <ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> > >>>>>>>>>>> >>> >> >> >> > Are there any unaddressed outstanding concerns >>>>>>>>>>> that we should hold up the SIP for? >>>>>>>>>>> >>> >> >> >> > >>>>>>>>>>> >>> >> >> >> > On Mon, 1 Nov, 2021, 10:31 pm Ishan >>>>>>>>>>> Chattopadhyaya, <ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>>> >>> >> >> >> >>> >> Agree. However, I disagree with ideas where >>>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to? >>>>>>>>>>> Separate roles for >>>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>>> >>> >> >> >> >>> >> nodes that do "faceting" or "spell >>>>>>>>>>> correction" etc.? But anyway, that is for discussion when we add >>>>>>>>>>> future >>>>>>>>>>> roles. This is beyond this SIP. >>>>>>>>>>> >>> >> >> >> >> >>>>>>>>>>> >>> >> >> >> >> >>>>>>>>>>> >>> >> >> >> >> > I am not asking you to implement every >>>>>>>>>>> possible role of course :). As a note I know a company that is >>>>>>>>>>> running an >>>>>>>>>>> entire separate >>>>>>>>>>> >>> >> >> >> >> > cluster to offload and better serve >>>>>>>>>>> highlighting on a subset of large docs, so YES I think there are >>>>>>>>>>> people who >>>>>>>>>>> may want such fine grained control. >>>>>>>>>>> >>> >> >> >> >> >>>>>>>>>>> >>> >> >> >> >> Cool, I think we can discuss adding any >>>>>>>>>>> additional roles (for highlighting?) on a case by case basis at a >>>>>>>>>>> later >>>>>>>>>>> point. >>>>>>>>>>> >>> >> >> >> >> >>>>>>>>>>> >>> >> >> >> >> >>>>>>>>>>> >>> >> >> >> >> On Mon, Nov 1, 2021 at 10:25 PM Ishan >>>>>>>>>>> Chattopadhyaya <ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>>> >>> >> >> >> >>> > Boiling it down the idea I'm proposing is >>>>>>>>>>> that roles required for back compatibility get explicitly added on >>>>>>>>>>> startup, >>>>>>>>>>> if not by the user then by the code. This is more flexible than >>>>>>>>>>> assuming >>>>>>>>>>> that no role means every role, because then every new feature that >>>>>>>>>>> has a >>>>>>>>>>> role will end up on legacy clusters which are also not back >>>>>>>>>>> compatible. >>>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>>> >>> >> >> >> >>> +1, I totally agree. I even said so, when I >>>>>>>>>>> said: "This is why I was advocating that 1) we assume the "data" as >>>>>>>>>>> a >>>>>>>>>>> default, 2) not assume overseer to be implicitly defined (because >>>>>>>>>>> of the >>>>>>>>>>> way overseer role is written today), 3) not assume any future roles >>>>>>>>>>> to be >>>>>>>>>>> true by default." >>>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>>> >>> >> >> >> >>> So, basically, I'm proposing that the "roles >>>>>>>>>>> required for back compatibility" (that should be explicitly added on >>>>>>>>>>> startup) be just the ["data"] role, and not the "overseer" role >>>>>>>>>>> (due to the >>>>>>>>>>> way overseer role is currently defined, i.e. it is "preferred >>>>>>>>>>> overseer"). >>>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>>> >>> >> >> >> >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck < >>>>>>>>>>> gus.h...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>>> >>> >> >> >> >>>> Very sorry don't mean to sound offended, >>>>>>>>>>> Frustrated yes offended no :)... the most difficult thing about >>>>>>>>>>> communication is the illusion it has occurred :) >>>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>>> >>> >> >> >> >>>> If you read back just a few emails you'll see >>>>>>>>>>> where I talk about roles being applied on startup. Boiling it down >>>>>>>>>>> the idea >>>>>>>>>>> I'm proposing is that roles required for back compatibility get >>>>>>>>>>> explicitly >>>>>>>>>>> added on startup, if not by the user then by the code. This is more >>>>>>>>>>> flexible than assuming that no role means every role, because then >>>>>>>>>>> every >>>>>>>>>>> new feature that has a role will end up on legacy clusters which >>>>>>>>>>> are also >>>>>>>>>>> not back compatible. >>>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>>> >>> >> >> >> >>>> There are points where I said all roles rather >>>>>>>>>>> than back compatibility roles because I was thinking about back >>>>>>>>>>> compatibility specifically, but you can't know that if I don't say >>>>>>>>>>> that can >>>>>>>>>>> you :). >>>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>>> >>> >> >> >> >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan >>>>>>>>>>> Chattopadhyaya <ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> >>>>> >>>>>>>>>>> >>> >> >> >> >>>>> > If you read more closely, my way can >>>>>>>>>>> provide full back compatibility. To say or imply it doesn't isn't >>>>>>>>>>> helping. >>>>>>>>>>> Perhaps you need to re-read? >>>>>>>>>>> >>> >> >> >> >>>>> >>>>>>>>>>> >>> >> >> >> >>>>> I understand e-mails are frustrating, and I'm >>>>>>>>>>> trying my best. Please don't be offended, and kindly point me to >>>>>>>>>>> the exact >>>>>>>>>>> part you want me to re-read. >>>>>>>>>>> >>> >> >> >> >>>>> >>>>>>>>>>> >>> >> >> >> >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck < >>>>>>>>>>> gus.h...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan >>>>>>>>>>> Chattopadhyaya <ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> > Positive - They denote the existence >>>>>>>>>>> of a capability >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> Agree, the SIP already reflects this. >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> > Absolute - Absence/Presence binary >>>>>>>>>>> identification of a capability; no implications, no assumptions >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> Disagree, we need backcompat handling on >>>>>>>>>>> nodes running without any roles. There has to be an implicit >>>>>>>>>>> assumption as >>>>>>>>>>> to what roles are those nodes assumed to have. My proposal is that >>>>>>>>>>> only the >>>>>>>>>>> "data" role be assumed, but not the "overseer" role. For any future >>>>>>>>>>> roles >>>>>>>>>>> ("coordinator", "zookeeper" etc.), this decision as to what absence >>>>>>>>>>> of any >>>>>>>>>>> role implies should be left to the implementation of that future >>>>>>>>>>> role. >>>>>>>>>>> Documentation should reflect clearly about these implicit >>>>>>>>>>> assumptions. >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> If you read more closely, my way can provide >>>>>>>>>>> full back compatibility. To say or imply it doesn't isn't helping. >>>>>>>>>>> Perhaps >>>>>>>>>>> you need to re-read? >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> > Focused - Do one thing per role >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> Agree. However, I disagree with ideas where >>>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to? >>>>>>>>>>> Separate roles for nodes that do "faceting" or "spell correction" >>>>>>>>>>> etc.? But >>>>>>>>>>> anyway, that is for discussion when we add future roles. This is >>>>>>>>>>> beyond >>>>>>>>>>> this SIP. >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> I am not asking you to implement every >>>>>>>>>>> possible role of course :). As a note I know a company that is >>>>>>>>>>> running an >>>>>>>>>>> entire separate cluster to offload and better serve highlighting on >>>>>>>>>>> a >>>>>>>>>>> subset of large docs, so YES I think there are people who may want >>>>>>>>>>> such >>>>>>>>>>> fine grained control. >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> > Accessible - It should be dead simple >>>>>>>>>>> to determine the members of a role, avoid parsing blobs of json, >>>>>>>>>>> avoid >>>>>>>>>>> calculating implications, avoid consulting other resources after >>>>>>>>>>> listing >>>>>>>>>>> nodes with the role >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> Agree. I'm open to any implementation >>>>>>>>>>> details that make it easy. There should be a reasonable API to >>>>>>>>>>> return these >>>>>>>>>>> node roles, with ability to filter by role or filter by node. >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> > Independent - One role should not >>>>>>>>>>> require other roles to be present >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> Do we need to have this hard and fast >>>>>>>>>>> requirement upfront? There might be situations where this is >>>>>>>>>>> desirable. I >>>>>>>>>>> feel we can discuss on a case by case basis whenever a future role >>>>>>>>>>> is added. >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> > Persistent - roles should not be lost >>>>>>>>>>> across reboot >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> Agree. >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> > Immutable - roles should not change >>>>>>>>>>> while the node is running >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> Agree >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> > Lively - A node with a capability may >>>>>>>>>>> not be presently providing that capability. >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> I don't understand, can you please >>>>>>>>>>> elaborate? >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> Specifically imagine the case where there >>>>>>>>>>> are 100 nodes: >>>>>>>>>>> >>> >> >> >> >>>>>> 1-100 ==> DATA >>>>>>>>>>> >>> >> >> >> >>>>>> 101-103 ==> OVERSEER >>>>>>>>>>> >>> >> >> >> >>>>>> 104-106 ==> ZOOKEEPER >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> But you won't have 3 overseers... you'll >>>>>>>>>>> want only one of those to be providing overseer functionality and >>>>>>>>>>> the other >>>>>>>>>>> two to be capable, but not providing (so that if the current >>>>>>>>>>> overseer goes >>>>>>>>>>> down a new one can be assigned). >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> Then you decide you'd ike 5 Zookeepers. You >>>>>>>>>>> start nodes 107-108 with that role, but you probably want to ensure >>>>>>>>>>> that >>>>>>>>>>> zookeepers require some sort of command for them to actually join >>>>>>>>>>> the >>>>>>>>>>> zookeeper cluster (i.e. /admin?action=ZKADD&nodes=node107,node18) >>>>>>>>>>> ... to do >>>>>>>>>>> that the nodes need to be up. But oh look I typoed 108... we want >>>>>>>>>>> that to >>>>>>>>>>> fail... how? because 18 does not have the capability to become a >>>>>>>>>>> zookeeper. >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan >>>>>>>>>>> Chattopadhyaya <ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> > Ilan: A node not having node.roles >>>>>>>>>>> defined should be assumed to have all roles. Not only data. I don't >>>>>>>>>>> see a >>>>>>>>>>> reason to special case this one or any role. >>>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: There should be no "assumptions" >>>>>>>>>>> Nothing to figure out. A node has a role or not. For back >>>>>>>>>>> compatibility >>>>>>>>>>> reasons, all roles would be assumed on startup if none specified. >>>>>>>>>>> >>> >> >> >> >>>>>>>> > Jan: No role == all roles. Explicit list >>>>>>>>>>> of roles = exactly those roles. >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> Problem with this approach is mainly to do >>>>>>>>>>> with backcompat. >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> 1. Overseer backcompat: >>>>>>>>>>> >>> >> >> >> >>>>>>>> If we don't make any modifications to how >>>>>>>>>>> overseer works and adopt this approach (as quoted), then imagine >>>>>>>>>>> this >>>>>>>>>>> situation: >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> Solr1-100: No roles param (assumed to be >>>>>>>>>>> "data,overseer"). >>>>>>>>>>> >>> >> >> >> >>>>>>>> Solr101: -Dnode.roles=overseer (intention: >>>>>>>>>>> dedicated overseer) >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> User wants this node Solr101 to be a >>>>>>>>>>> dedicated overseer, but for that to happen, he/she would need to >>>>>>>>>>> restart >>>>>>>>>>> all the data nodes with -Dnode.roles=data. This will cause >>>>>>>>>>> unnecessary >>>>>>>>>>> disruption to running clusters where a dedicated overseer is >>>>>>>>>>> needed. Keep >>>>>>>>>>> in mind, if a user needs a dedicated overseer, he's likely in an >>>>>>>>>>> emergency >>>>>>>>>>> situation and restarting the whole cluster might not be viable for >>>>>>>>>>> him/her. >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> 2. Future roles might not be compatible >>>>>>>>>>> with this "assumed to have all roles" idea: >>>>>>>>>>> >>> >> >> >> >>>>>>>> Take the proposed "zookeeper" role for >>>>>>>>>>> example. Today, regular nodes are not supposed to have embedded ZK >>>>>>>>>>> running >>>>>>>>>>> on them. By introducing this artificial limitation ("assumed to >>>>>>>>>>> have all >>>>>>>>>>> roles"), we constrain adoption of all future roles to necessarily >>>>>>>>>>> require a >>>>>>>>>>> full cluster restart. >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> Keep in mind newer Solr versions can >>>>>>>>>>> introduce new capabilities and roles. Imagine we have a role that is >>>>>>>>>>> defined in a new Solr version (and there's functionality to go with >>>>>>>>>>> that >>>>>>>>>>> role), and user upgrades to that version. However, his/her nodes >>>>>>>>>>> all were >>>>>>>>>>> started with no node.roles param. Hence, if those nodes are >>>>>>>>>>> "assumed to >>>>>>>>>>> have all roles", then just by virtue of upgrading to this new >>>>>>>>>>> version, new >>>>>>>>>>> capabilities will be turned on for the entire cluster, whether or >>>>>>>>>>> not the >>>>>>>>>>> user opted for such a capability. This is totally undesirable. >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: I actually don't want a coordinator >>>>>>>>>>> to do more work, I would prefer small focused roles with names that >>>>>>>>>>> accurately describe their function. In that light, COORDINATOR >>>>>>>>>>> might be too >>>>>>>>>>> nebulous. How about AGREGATOR role? (what I was thinking of would >>>>>>>>>>> better be >>>>>>>>>>> called a QUERY_ANALYSIS role) >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> If you want to do specific things like >>>>>>>>>>> query analysis or query aggregation or bulk indexing etc, all of >>>>>>>>>>> those can >>>>>>>>>>> be done on COORDINATOR nodes (as is the case in ElasticSearch). >>>>>>>>>>> Having tens >>>>>>>>>>> of of " small focused roles" defined as first class concepts would >>>>>>>>>>> be >>>>>>>>>>> confusing to the user. As a remedy to your situation where you want >>>>>>>>>>> the >>>>>>>>>>> coordinator role to also do query-analysis for shards, one possible >>>>>>>>>>> solution is to send such a query to a coordinator node with a >>>>>>>>>>> parameter >>>>>>>>>>> like "coordinator.query_analysis=true", and then the coordinator, >>>>>>>>>>> instead >>>>>>>>>>> of blindly hitting remote shards, also does some extra work on >>>>>>>>>>> behalf of >>>>>>>>>>> the shards. >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan >>>>>>>>>>> Chattopadhyaya <ichattopadhy...@gmail.com> wrote: >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > If we make collections role-aware for >>>>>>>>>>> example (replicas of that collection can only be >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > placed on nodes with a specific role, >>>>>>>>>>> in addition to the other role based constraints), >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > the set of roles should be user >>>>>>>>>>> extensible and not fixed. >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > If collections are not role aware, the >>>>>>>>>>> constraints introduced by roles apply to all collections >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > equally which might be insufficient if >>>>>>>>>>> a user needs for example a heavily used collection to >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > only be placed on more powerful nodes. >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> I feel node roles and role-aware >>>>>>>>>>> collections are orthogonal topics. What you describe above can be >>>>>>>>>>> achieved >>>>>>>>>>> by the autoscaling+replica placement framework where the placement >>>>>>>>>>> plugins >>>>>>>>>>> take the node roles as one of the inputs. >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > It does impact the design from early >>>>>>>>>>> on: the set of roles need to be expandable by a user >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > by creating a collection with new roles >>>>>>>>>>> for example (consumed by placement plugins) and be >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > able to start nodes with new >>>>>>>>>>> (arbitrary) roles. Should such roles follow some naming syntax to >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > differentiate them from built in roles? >>>>>>>>>>> To be able to fail on typos on roles - that otherwise can be >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > crippling and hard to debug. This >>>>>>>>>>> implies in any case that the current design can't assume all >>>>>>>>>>> >>> >> >> >> >>>>>>>>> > roles are known at compile time or >>>>>>>>>>> define them in a Java enum. >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> I think this should be achieved by >>>>>>>>>>> something different from roles. Something like node labels (user >>>>>>>>>>> defined) >>>>>>>>>>> which can then be used in a replica placement plugin to assign >>>>>>>>>>> replicas. I >>>>>>>>>>> see roles as more closely associated with kinds of functionality a >>>>>>>>>>> node is >>>>>>>>>>> designated for. Therefore, I feel that replica placements and user >>>>>>>>>>> defined >>>>>>>>>>> node labels is out of scope for this SIP. It can be added later in a >>>>>>>>>>> separate SIP, without being at odds with this proposal. >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan >>>>>>>>>>> Høydahl <jan....@cominvent.com> wrote: >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan >>>>>>>>>>> Ginzburg <ilans...@gmail.com>: >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> > A node not having node.roles defined >>>>>>>>>>> should be assumed to have all roles. Not only data. I don't see a >>>>>>>>>>> reason to >>>>>>>>>>> special case this one or any role. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> +1, make it simple and transparent. No >>>>>>>>>>> role == all roles. Explicit list of roles = exactly those roles. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> > (Gus) See my comment above, but maybe >>>>>>>>>>> preference is something handled as a feature of the role rather >>>>>>>>>>> than via >>>>>>>>>>> role designation? >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> Yea, we always need an overseer, so that >>>>>>>>>>> feature can decide to use its list of nodes as a preference if it so >>>>>>>>>>> chooses. >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> Aside: I think it makes it easier if we >>>>>>>>>>> always prefix Solr env.vars and sys.props with "SOLR_" or "solr.", >>>>>>>>>>> i.e. >>>>>>>>>>> -Dsolr.node.roles=foo. That way we can get away from having to have >>>>>>>>>>> explicit code in bin/solr, bin/solr.cmd and SolrCLI to manage every >>>>>>>>>>> single >>>>>>>>>>> property. Instead we can parse all ENVs and Props with the solr >>>>>>>>>>> prefix in >>>>>>>>>>> our bootstrap code. And we can by convention allow e.g. docker run >>>>>>>>>>> -e >>>>>>>>>>> SOLR_NODE_ROLES=foo solr:9 and it would be the same ting... >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> Jan >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> To unsubscribe, e-mail: >>>>>>>>>>> dev-unsubscr...@solr.apache.org >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> For additional commands, e-mail: >>>>>>>>>>> dev-h...@solr.apache.org >>>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>>> >>> >> >> >> >>>>>> -- >>>>>>>>>>> >>> >> >> >> >>>>>> http://www.needhamsoftware.com (work) >>>>>>>>>>> >>> >> >> >> >>>>>> http://www.the111shift.com (play) >>>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>>> >>> >> >> >> >>>> -- >>>>>>>>>>> >>> >> >> >> >>>> http://www.needhamsoftware.com (work) >>>>>>>>>>> >>> >> >> >> >>>> http://www.the111shift.com (play) >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> >>> >> >> >> To unsubscribe, e-mail: >>>>>>>>>>> dev-unsubscr...@solr.apache.org >>>>>>>>>>> >>> >> >> >> For additional commands, e-mail: >>>>>>>>>>> dev-h...@solr.apache.org >>>>>>>>>>> >>> >> >> >> >>>>>>>>>>> >>> >> >> > >>>>>>>>>>> >>> >> >> > >>>>>>>>>>> >>> >> >> > -- >>>>>>>>>>> >>> >> >> > http://www.needhamsoftware.com (work) >>>>>>>>>>> >>> >> >> > http://www.the111shift.com (play) >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> >>> >> >> To unsubscribe, e-mail: >>>>>>>>>>> dev-unsubscr...@solr.apache.org >>>>>>>>>>> >>> >> >> For additional commands, e-mail: >>>>>>>>>>> dev-h...@solr.apache.org >>>>>>>>>>> >>> >> >> >>>>>>>>>>> >>> >> >>>>>>>>>>> >>> >> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> >>> >> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org >>>>>>>>>>> >>> >> For additional commands, e-mail: dev-h...@solr.apache.org >>>>>>>>>>> >>> >> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> >>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org >>>>>>>>>>> >>> For additional commands, e-mail: dev-h...@solr.apache.org >>>>>>>>>>> >>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org >>>>>>>>>>> For additional commands, e-mail: dev-h...@solr.apache.org >>>>>>>>>>> >>>>>>>>>>> >>>>>> >>>>>> -- >>>>>> ----------------------------------------------------- >>>>>> Noble Paul >>>>>> >>>>>> >>>>>>