Yes Ilan The coordinator is the first compelling usecase. The roles is the UX and it's a very simple piece. The real work is coming as a separate PR.
Roles can be achieved in a clumsy way today. It's unintuitive and we don't want to make the user to jump through the hoops. I'll open a PR and you be the judge on the simplicity of this SIP. It's not going to have any major impact on any component of Solr. On Fri, Nov 5, 2021, 2:01 AM Ilan Ginzburg <[email protected]> wrote: > I was noting that the real value of the proposal (real value = being able > to do things that are currently impossible with Solr) was due to an > independent concept of a coordinator "core", and that if we had this > (currently does not exist in Solr but apparently you do have it on a fork), > we can achieve most/all of what the SIP proposes with existing means, i.e. > without roles. Maybe in a less flexible/user friendly way, maybe not (given > the details of rolling out roles are still fuzzy). > And if we don't have the concept of coordinator core, then the roles by > themselves do not allow much more than what is already achievable by other > means. > > Ilan > > On Thu, Nov 4, 2021 at 12:02 PM Noble Paul <[email protected]> wrote: > >> The placement part of roles feature may use placement plugin API . >> >> >> The implementation is not what we're discussing here. We need a >> consistent story for the user when it comes to roles. This discussion is >> about the UX rather than the impl. >> >> Most of our discussions are about how we should implement it >> >> >> >> On Thu, Nov 4, 2021, 9:27 PM Ilan Ginzburg <[email protected]> wrote: >> >>> A lot of the value of this SIP relies on the pseudo-core thing (because >>> placing on specific nodes is achievable today, Overseer role already >>> exists). Roles as described without the coordinator concept are just >>> another way to do things already possible today (with a very minor update >>> on the Affinity placement plugin - it might even support it right away >>> actually, didn't check). >>> Maybe "pseudo core" should go in first and condition the rest of the >>> work? It feels like a bigger chunk with more challenging integration issues >>> (routing, new concept in the collection/shard/replica hierarchy). >>> >>> Ilan >>> >>> On Thu, Nov 4, 2021 at 11:20 AM Noble Paul <[email protected]> wrote: >>> >>>> None of the design is dictated by the version in which we implement >>>> this. The SIP is mostly about the "what", "why" and the UX >>>> >>>> I don't have any affinity to any particular version. This is definitely >>>> going to happen in 9.x. Even if it is built in 9.x we will have to build >>>> and support all versions of solr we use internally. When we eventually >>>> upgrade from our current version to a 9.x version , it has to be backward >>>> compatible.The choice of whether this is available for public consumption >>>> as a branch/release is up for debate >>>> >>>> On Thu, Nov 4, 2021, 8:28 PM Jan Høydahl <[email protected]> wrote: >>>> >>>>> Let's do ourself a service and target 9.0 for roles. It's too late to >>>>> plan new features into 8.x. >>>>> >>>>> I don't understand the urgency either. I can get that certain Solr >>>>> users would wish for such a feature "yesterday" but that cannot drive our >>>>> decisions on what version to target for features. When targeting 9.0, all >>>>> upgrade or back-compat worries will need to be baked into the feature >>>>> itself, so that there is either code support or good documentation for how >>>>> to start using roles after upgrading a cluster to 9.0. Perhaps there must >>>>> be a temporary cluster-property in 9.0 "enableRoles=false" that can be >>>>> set, >>>>> even if all 9.0 nodes are given roles on startup. Then, initially after >>>>> the >>>>> upgrade, the cluster behaves as it did in 8.x. Then once you are ready to >>>>> enforce roles, you can flip the cluster property, and placement and >>>>> routing >>>>> starts using roles. In 10.0 that property can then go away. >>>>> >>>>> When it comes to placement plugins, we can document in that they MUST >>>>> respect certain node roles (at least the data role), and treat it as a bug >>>>> if they don't. >>>>> >>>>> Jan >>>>> >>>>> 4. nov. 2021 kl. 03:36 skrev Noble Paul <[email protected]>: >>>>> >>>>> Thanks everyone for participating in the discussion. I have gone >>>>> through all your valuable inputs and these are my suggestions >>>>> >>>>> Requirements? >>>>> >>>>> 1. Users should be able to designate a node with some role by >>>>> starting (say -Dnode.roles=coordinator) >>>>> 2. This node should be able to perform a certain behavior >>>>> 3. Replica placement should be aware of this and may choose to >>>>> place or not place a replica in this node >>>>> 4. Any client should be able to query any node in the cluster to >>>>> get a list of nodes with a specified role or get the roles of a given >>>>> node >>>>> >>>>> >>>>> Implementation? >>>>> Here is how we could implement each of the requirements: >>>>> >>>>> 1. We could theoretically use a well known system property and >>>>> 2. The actual behavior will have to be implemented in both 8.x or >>>>> 9.x >>>>> 3. Placement of replicas >>>>> 1. It’s not possible to do this in 8.x >>>>> 2. In 9.x, replica placement plugin can be internally used to >>>>> ensure proper placement of replicas in the roles feature. >>>>> >>>>> 1. It can’t be done with the current design as users cannot >>>>> chain multiple placement plugins or user has to build a custom >>>>> placement >>>>> plugin of his own >>>>> 2. There is no standard UX to achieve this. It will be a >>>>> recipe (start nodes with this property and create these rules >>>>> etc, etc). >>>>> This is awkward & error prone, as compared to saying “start a >>>>> node with >>>>> coordinator role” and Solr will take care of it. >>>>> 4. There will be a new API endpoint to publish this >>>>> information in 8.x and 9.x. This end point is important to make this >>>>> feature usable >>>>> >>>>> >>>>> Conclusion >>>>> >>>>> 1. With a roles feature, we can achieve the objectives in a user >>>>> friendly and intuitive way >>>>> 2. The user interface can be consistent across 8.x and 9.x even >>>>> though 9.x can use the placement plugin internally >>>>> 3. The actual roles definition will be same across 8.x and 9.x >>>>> >>>>> >>>>> >>>>> On Thu, Nov 4, 2021 at 6:32 AM Noble Paul <[email protected]> >>>>> wrote: >>>>> >>>>>> Michael >>>>>> >>>>>> We explored all options to before arriving at this solution. Ishan >>>>>> has already explained why Tim's suggestions have their shortcomings when >>>>>> it >>>>>> comes to user experience. >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Nov 4, 2021, 3:51 AM Michael Gibney < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> >I actually didn't realize that an empty Solr node would forward the >>>>>>> top-level >>>>>>> >request onward instead of just being the query controller itself? >>>>>>> That >>>>>>> >actually seems like a bug vs. a feature, IMO any node that receives >>>>>>> >the top-level query should just be the coordinator, what stops it? >>>>>>> >>>>>>> +1 to Tim's statement quoted above; unless I'm missing something, >>>>>>> this feels like an issue that should be addressed regardless of this >>>>>>> SIP. >>>>>>> (perhaps it would be addressed incidentally by this SIP? -- in any event >>>>>>> the current situation seems to not make sense. As Tim points out, the >>>>>>> relevant configs should in principle be accessible from ZK whether or >>>>>>> not >>>>>>> there's a core for a given collection on a given node). >>>>>>> >>>>>>> Considering the above, and especially given Ishan that you say "The >>>>>>> coordinator role is the biggest motivation for introducing the concept >>>>>>> of >>>>>>> roles", while reading the SIP I found myself wishing for a fuller >>>>>>> enumeration of use cases, and a more sympathetic characterization of >>>>>>> alternatives (existing alternatives, and perhaps, as with the above >>>>>>> "proxy >>>>>>> request" issue, simpler-but-not-yet-implemented alternatives). >>>>>>> >>>>>>> Combining questions about use cases with questions about >>>>>>> alternatives: assuming that 9.x autoscaling can indeed be reliably used >>>>>>> to >>>>>>> stop replicas from being placed on nodes, how close would addressing the >>>>>>> orthogonal "proxy request" issue come to addressing potential use cases? >>>>>>> >>>>>>> Michael >>>>>>> >>>>>>> >>>>>>> On Wed, Nov 3, 2021 at 10:00 AM Ilan Ginzburg <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I think if we have the new "pseudo core" abstraction (I like it! >>>>>>>> Will it really be a core with an index on disk or some new abstraction >>>>>>>> only >>>>>>>> tracked in ZK and in memory?) to play the role of coordinator, then we >>>>>>>> have >>>>>>>> all we need with the affinity placement plugin framework for a data >>>>>>>> free >>>>>>>> coordinator node implementation. >>>>>>>> It is easy to use system properties to exclude nodes from >>>>>>>> receiving replicas using the placement plugins, a minor change in the >>>>>>>> Affinity Placement Plugin. Such nodes will not receive any replicas by >>>>>>>> the >>>>>>>> placement plugin not even at startup (the system property will be >>>>>>>> assigned >>>>>>>> at startup so no manual intervention needed). >>>>>>>> >>>>>>>> It will not work if switching to another placement plugin, unless >>>>>>>> that other plugin reimplements that (simple) aspect. Is that an issue? >>>>>>>> >>>>>>>> Ilan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Nov 3, 2021 at 2:57 AM Ishan Chattopadhyaya < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Answers inline below. >>>>>>>>> >>>>>>>>> On Wed, Nov 3, 2021 at 5:56 AM Timothy Potter < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> One last thought on this for me ... I think it would be >>>>>>>>>> beneficial for >>>>>>>>>> the SIP to address how this new feature will work with the >>>>>>>>>> existing >>>>>>>>>> shards.preference solution and affinity based placement plugin. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I was more inclined to keep this SIP focused on broad concept of >>>>>>>>> roles, and any upcoming roles (coordinator role, along with that >>>>>>>>> pseudo-core functionality) to be described in their own issue (e.g. >>>>>>>>> SOLR-15715). >>>>>>>>> >>>>>>>>> >>>>>>>>>> Moreover, your pseudo-replica solution sounds like a new replica >>>>>>>>>> type >>>>>>>>>> vs. a node level thing. >>>>>>>>> >>>>>>>>> >>>>>>>>> I misspoke when I called it "pseudo replica", it is actually a >>>>>>>>> "pseudo core". Replicas are shard level concepts, but such a pseudo >>>>>>>>> core >>>>>>>>> that we plan to introduce will pertain to one or more collections. >>>>>>>>> Imagine >>>>>>>>> collection1 has shard1 and shard2, there will be a single pseudo core >>>>>>>>> for >>>>>>>>> collection1 (we haven't decided on the prefix of this pseudo core >>>>>>>>> yet, but >>>>>>>>> a candidate can be ".collection1_coordinator"). Replica type won't >>>>>>>>> fit this >>>>>>>>> mental model here. We can discuss this more in the SOLR-15715 issue. >>>>>>>>> >>>>>>>>> The placement strategy can place replicas >>>>>>>>>> based on replica type and node type (just a system property), so >>>>>>>>>> please address why you can't achieve a query coordinator behavior >>>>>>>>>> with >>>>>>>>>> a new replica type + improvements to the Affinity placement >>>>>>>>>> plugin? >>>>>>>>>> >>>>>>>>> >>>>>>>>> To put down my thoughts on why Affinity placement plugin won't >>>>>>>>> work for the purpose of ensuring that we have nodes that host no data >>>>>>>>> on it: >>>>>>>>> 1. We want the ability to have nodes with no data on it as a first >>>>>>>>> class concept for users. Hence, if the Affinity placement plugin is >>>>>>>>> used >>>>>>>>> for that purpose, users won't be able to switch out that plugin and >>>>>>>>> use >>>>>>>>> anything of their own. Currently, IIUC, there's not way for users to >>>>>>>>> use >>>>>>>>> multiple placement plugins. >>>>>>>>> 2. Nodes that shouldn't host any replica on it are generally >>>>>>>>> ephemeral in nature; many of them may join the cluster, they may go >>>>>>>>> away. >>>>>>>>> If such a node joins the cluster, they immediately become eligible for >>>>>>>>> replica placement, before even the sysadmin is able to assign an >>>>>>>>> affinity >>>>>>>>> placement configuration for that node. This is a problem. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Tim >>>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for your thoughts and feedback, I think it will help us put >>>>>>>>> together the document with more insights into our design choices. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Ishan >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Nov 2, 2021 at 6:14 PM Ishan Chattopadhyaya >>>>>>>>>> <[email protected]> wrote: >>>>>>>>>> > >>>>>>>>>> > Also, in a cluster where new collections/shards/replicas are >>>>>>>>>> continuously added all the time, it would be pretty awkward to start >>>>>>>>>> a node >>>>>>>>>> (in regular mode), briefly have it become eligible for replica >>>>>>>>>> assignment, >>>>>>>>>> then invoking a replica placement rule/autoscaling policy for that >>>>>>>>>> node to >>>>>>>>>> not place replicas on it. Instead, starting a node with a defined >>>>>>>>>> role (as >>>>>>>>>> a startup param) precludes that brief period of eligibility for >>>>>>>>>> replica >>>>>>>>>> placement on such a node. >>>>>>>>>> > >>>>>>>>>> > On Wed, Nov 3, 2021 at 5:39 AM Ishan Chattopadhyaya < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >> >>>>>>>>>> >> If we were to tell users how to do "scatter gather on an empty >>>>>>>>>> node", *how exactly* would you recommend users have an empty node to >>>>>>>>>> begin >>>>>>>>>> with? Wouldn't you say something like "for 8x you can do this (rule >>>>>>>>>> based >>>>>>>>>> replica placement) or do that (autoscaling), but for 9x you do this >>>>>>>>>> new >>>>>>>>>> thing". Having a node that doesn't have a data role seems like a >>>>>>>>>> consistent >>>>>>>>>> and an elegant way for users to invoke such a functionality and also >>>>>>>>>> easily >>>>>>>>>> relate to a broad concept, without having to deal with autoscaling >>>>>>>>>> frameworks of the ancient past, medieval past or the future. >>>>>>>>>> >> >>>>>>>>>> >> On Wed, Nov 3, 2021 at 5:29 AM Timothy Potter < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>> As opposed to what? Looking up the configset for the addressed >>>>>>>>>> >>> collection and pulling whatever information it needs from >>>>>>>>>> cached data. >>>>>>>>>> >>> I'm sure there are some nuances but I hardly think you need a >>>>>>>>>> node >>>>>>>>>> >>> role framework to deal with determine the unique key field to >>>>>>>>>> do >>>>>>>>>> >>> scatter gather on an empty node when you have easy access to >>>>>>>>>> >>> collection metadata. >>>>>>>>>> >>> >>>>>>>>>> >>> Doesn't seem like a hard thing to overcome to me. >>>>>>>>>> >>> >>>>>>>>>> >>> On Tue, Nov 2, 2021 at 5:49 PM Noble Paul < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > On Wed, Nov 3, 2021, 10:46 AM Timothy Potter < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>> >> >>>>>>>>>> >>> >> I'm not missing the point of the query coordinator, but I >>>>>>>>>> actually >>>>>>>>>> >>> >> didn't realize that an empty Solr node would forward the >>>>>>>>>> top-level >>>>>>>>>> >>> >> request onward instead of just being the query controller >>>>>>>>>> itself? That >>>>>>>>>> >>> >> actually seems like a bug vs. a feature, IMO any node that >>>>>>>>>> receives >>>>>>>>>> >>> >> the top-level query should just be the coordinator, what >>>>>>>>>> stops it? >>>>>>>>>> >>> > >>>>>>>>>> >>> > >>>>>>>>>> >>> > To process a request there should be a core that uses the >>>>>>>>>> same configset as the requested collection. >>>>>>>>>> >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> >>> >> Anyway, it sounds to me like you guys have your minds made >>>>>>>>>> up >>>>>>>>>> >>> >> regardless of feedback. >>>>>>>>>> >>> >> >>>>>>>>>> >>> >> Btw ~ I only mentioned the Zookeeper part b/c it's in your >>>>>>>>>> SIP as a >>>>>>>>>> >>> >> specific role, not sure why you took that as me wanting to >>>>>>>>>> discuss the >>>>>>>>>> >>> >> embedded ZK in your SIP? >>>>>>>>>> >>> >> >>>>>>>>>> >>> >> On Tue, Nov 2, 2021 at 5:13 PM Ishan Chattopadhyaya >>>>>>>>>> >>> >> <[email protected]> wrote: >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > Hi Tim, >>>>>>>>>> >>> >> > Here are my responses inline. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > On Wed, Nov 3, 2021 at 3:22 AM Timothy Potter < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> I'm just not convinced this feature is even needed and >>>>>>>>>> the SIP is not >>>>>>>>>> >>> >> >> convincing that "There is no proper alternative today." >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > There are no proper alternatives today, just hacks. On >>>>>>>>>> 8x, we have two different deprecated frameworks to stop nodes from >>>>>>>>>> being >>>>>>>>>> placed on a node (1. rule based replica placement, 2. autoscaling >>>>>>>>>> framework). On 9x, we have a new autoscaling framework, which I >>>>>>>>>> don't even >>>>>>>>>> think is fully implemented. And, there's definitely no way to have a >>>>>>>>>> node >>>>>>>>>> act as a query coordinator without having data on it. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> 1) Just b/c Elastic and Vespa have a concept of node >>>>>>>>>> roles, doesn't >>>>>>>>>> >>> >> >> mean Solr needs this. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > Solr needs this. Elastic has such concepts is a >>>>>>>>>> coincidence, and also means we have an opportunity to catch up with >>>>>>>>>> them; >>>>>>>>>> they have these concepts for a reason. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> Also, some of Elastic's roles overlap with >>>>>>>>>> >>> >> >> concepts Solr already has in a different form, i.e >>>>>>>>>> data_hot sounds >>>>>>>>>> >>> >> >> like NRT and data_warm sounds a lot like our Pull >>>>>>>>>> Replica Type >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > I think that is beyond the scope of this SIP. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> 2) You can achieve the "coordinator" role with >>>>>>>>>> auto-scaling rules >>>>>>>>>> >>> >> >> pre-9.x and with the AffinityPlacementPlugin (heck, it >>>>>>>>>> even has a node >>>>>>>>>> >>> >> >> type built in: >>>>>>>>>> .requestNodeSystemProperty(AffinityPlacementConfig.NODE_TYPE_SYSPROP). >>>>>>>>>> >>> >> >> Simply build your replica placement rules such that no >>>>>>>>>> replicas land >>>>>>>>>> >>> >> >> on "coordinator" nodes. And you can route queries using >>>>>>>>>> node.sysprop >>>>>>>>>> >>> >> >> already using shards.preference. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > I think you missed the whole point of the query >>>>>>>>>> coordinator. Please refer to this >>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715. >>>>>>>>>> >>> >> > Let me summarize the main difference between what (I >>>>>>>>>> think) you refer to and what is proposed in SOLR-15715. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > With your suggestion, we'll have a node that doesn't >>>>>>>>>> host any replicas. And you suggest queries landing on such nodes be >>>>>>>>>> routed >>>>>>>>>> using shards.preference? Well, in such a case, these queries will be >>>>>>>>>> forwarded/proxied to a random node hosting a replica of the >>>>>>>>>> collection and >>>>>>>>>> that node then acts as the coordinator. This situation is no better >>>>>>>>>> than >>>>>>>>>> sending the query directly to that particular node. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > What is proposed in SOLR-15715 is a query aggregation >>>>>>>>>> functionality. There will be pseudo replicas (aware of the >>>>>>>>>> configset) on >>>>>>>>>> this coordinator node that handle the request themselves, sends shard >>>>>>>>>> requests to data hosting replicas, collects responses and merges >>>>>>>>>> them, and >>>>>>>>>> sends back to the user. This merge step is usually extremely memory >>>>>>>>>> intensive, and it would be good to serve these off stateless nodes >>>>>>>>>> (that >>>>>>>>>> host no data). >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> 3) Dedicated overseer role? I thought we were removing >>>>>>>>>> the overseer?!? >>>>>>>>>> >>> >> >> Also, we already have the ability to run the overseer >>>>>>>>>> on specific >>>>>>>>>> >>> >> >> nodes w/o a new framework, so this doesn't really >>>>>>>>>> convince me we need >>>>>>>>>> >>> >> >> a new framework. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > There's absolutely no change proposed to the "overseer" >>>>>>>>>> role. What users need on production clusters are nodes dedicated for >>>>>>>>>> overseer operations, and for that the current "overseer" role >>>>>>>>>> suffices, >>>>>>>>>> together with some functionality to not place replicas on such nodes. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> 4) We will indeed need to decide which nodes host >>>>>>>>>> embedded Zookeeper's >>>>>>>>>> >>> >> >> but I'd argue that solution hasn't been designed >>>>>>>>>> entirely and we >>>>>>>>>> >>> >> >> probably don't need a formal node role framework to >>>>>>>>>> determine which >>>>>>>>>> >>> >> >> nodes host embedded ZKs. Moreover, embedded ZK seems >>>>>>>>>> more like a small >>>>>>>>>> >>> >> >> cluster thing and anyone running a large cluster will >>>>>>>>>> probably have a >>>>>>>>>> >>> >> >> dedicated ZK ensemble as they do today. The node role >>>>>>>>>> thing seems like >>>>>>>>>> >>> >> >> it's intended for large clusters and my gut says few >>>>>>>>>> will use embedded >>>>>>>>>> >>> >> >> ZK for large clusters. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > This SIP is not the right place for this discussion. >>>>>>>>>> There's a separate SIP for this. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> 5) You can also achieve a lot of "node role" >>>>>>>>>> functionality in query >>>>>>>>>> >>> >> >> routing using the shards.preference parameter. >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > That doesn't solve the purpose behind >>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> At the very least, the SIP needs to list specific use >>>>>>>>>> cases that >>>>>>>>>> >>> >> >> require this feature that are not achievable with the >>>>>>>>>> current features >>>>>>>>>> >>> >> >> before getting bogged down in the impl. details. >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> > The coordinator role is the biggest motivation for >>>>>>>>>> introducing the concept of roles. However, in addition to what is >>>>>>>>>> proposed >>>>>>>>>> in SOLR-15715, a coordinator node can later on also be used as a >>>>>>>>>> node for >>>>>>>>>> users to run streaming expressions on, do bulk indexing on (impl >>>>>>>>>> details >>>>>>>>>> for this to come later, don't want distraction here). >>>>>>>>>> >>> >> > >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> Tim >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> On Tue, Nov 2, 2021 at 3:20 PM Gus Heck < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>> >> >> > >>>>>>>>>> >>> >> >> > I think there are things not yet accounted for. Time >>>>>>>>>> I spent yesterday is biting me today. Pls give a couple days. >>>>>>>>>> >>> >> >> > >>>>>>>>>> >>> >> >> > On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> Hey Ishan, >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> I appreciate you writing up the SIP! Here's some >>>>>>>>>> notes/questions I >>>>>>>>>> >>> >> >> >> had as I was reading through your writeup and this >>>>>>>>>> mail thread. >>>>>>>>>> >>> >> >> >> ("----" separators between thoughts, hopefully that >>>>>>>>>> helps.) >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> ---- >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> I'll add my vote to what Jan, Gus, Ilan, and Houston >>>>>>>>>> already >>>>>>>>>> >>> >> >> >> suggested: roles should default to "all-on". I see >>>>>>>>>> the downsides >>>>>>>>>> >>> >> >> >> you're worried about with that approach (esp. around >>>>>>>>>> 'overseer'), but >>>>>>>>>> >>> >> >> >> they may be mitigatable, at least in part. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> > [mail thread] User wants this node Solr101 to be a >>>>>>>>>> dedicated overseer, but for that to happen, he/she would need to >>>>>>>>>> restart >>>>>>>>>> all the data nodes with -Dnode.roles=data >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> Sure, if roles can only be specified at startup. >>>>>>>>>> But that may be a >>>>>>>>>> >>> >> >> >> self-imposed constraint. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> An API to change a node's roles would remove the >>>>>>>>>> need for a restart >>>>>>>>>> >>> >> >> >> and make it easy for users to affect the semantics >>>>>>>>>> they want. You >>>>>>>>>> >>> >> >> >> decided you want a dedicated overseer N nodes into >>>>>>>>>> your cluster >>>>>>>>>> >>> >> >> >> deployment? Deploy node 'N' with the 'overseer', >>>>>>>>>> and toggle the >>>>>>>>>> >>> >> >> >> overseer role off on the remainder. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> Now, I understand that you don't want roles to >>>>>>>>>> change at runtime, but >>>>>>>>>> >>> >> >> >> I haven't seen you get much into "why", beyond >>>>>>>>>> saying "it is very >>>>>>>>>> >>> >> >> >> risky to have nodes change roles while they are up >>>>>>>>>> and running." Can >>>>>>>>>> >>> >> >> >> you expand a bit on the risks you're worried about? >>>>>>>>>> If you're >>>>>>>>>> >>> >> >> >> explicit about them here maybe someone can think of >>>>>>>>>> a clever way to >>>>>>>>>> >>> >> >> >> address them? >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> > Hence, if those nodes are "assumed to have all >>>>>>>>>> roles", then just by virtue of upgrading to this new version, new >>>>>>>>>> capabilities will be turned on for the entire cluster, whether or >>>>>>>>>> not the >>>>>>>>>> user opted for such a capability. This is totally undesirable. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> Obviously "roles" refer to much bigger chunks of >>>>>>>>>> functionality than >>>>>>>>>> >>> >> >> >> usual, so in a sense defaulting roles on is >>>>>>>>>> scarier. But in a sense >>>>>>>>>> >>> >> >> >> you're describing something that's an inherent part >>>>>>>>>> of software >>>>>>>>>> >>> >> >> >> releases. Releases expose new features that are >>>>>>>>>> typically on by >>>>>>>>>> >>> >> >> >> default. A new default-on role in 9.1 might hurt a >>>>>>>>>> user, but there's >>>>>>>>>> >>> >> >> >> no fundamental difference between that and a change >>>>>>>>>> to backups or >>>>>>>>>> >>> >> >> >> replication or whatever in the same release. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> I don't mean to belittle the difference in scope - I >>>>>>>>>> get your concern. >>>>>>>>>> >>> >> >> >> But IMO this is something to address with good >>>>>>>>>> release notes and >>>>>>>>>> >>> >> >> >> documentation. Designing for admins who don't do >>>>>>>>>> even cursory >>>>>>>>>> >>> >> >> >> research before an upgrade ties both our hands >>>>>>>>>> behind our back as a >>>>>>>>>> >>> >> >> >> project. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> ---- >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> > [SIP] Internal representation in ZK ... >>>>>>>>>> Implementation details like these can be fleshed out in the PR >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> IMO this is important enough to flush out as part of >>>>>>>>>> the SIP, at least >>>>>>>>>> >>> >> >> >> in broad strokes. It affects backcompat, SolrJ >>>>>>>>>> client design, etc. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> ---- >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> > [SIP] GET /api/cluster/roles?node=node1 >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> Woohoo - way to include a v2 API definition! >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> AFAIR, the v2 API has a /nodes path defined - I >>>>>>>>>> wonder whether "GET >>>>>>>>>> >>> >> >> >> /nodes/someNode/roles" wouldn't be a more intuitive >>>>>>>>>> endpoint for the >>>>>>>>>> >>> >> >> >> "get the roles this node has" functionality. Though >>>>>>>>>> I leave that for >>>>>>>>>> >>> >> >> >> your consideration. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> ---- >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> Looking forward to your responses and seeing the SIP >>>>>>>>>> progress! It's a >>>>>>>>>> >>> >> >> >> really cool, promising idea IMO. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> Best, >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> Jason >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya >>>>>>>>>> >>> >> >> >> <[email protected]> wrote: >>>>>>>>>> >>> >> >> >> > >>>>>>>>>> >>> >> >> >> > Are there any unaddressed outstanding concerns >>>>>>>>>> that we should hold up the SIP for? >>>>>>>>>> >>> >> >> >> > >>>>>>>>>> >>> >> >> >> > On Mon, 1 Nov, 2021, 10:31 pm Ishan >>>>>>>>>> Chattopadhyaya, <[email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>> >>> >> >> >> >>> >> Agree. However, I disagree with ideas where >>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to? >>>>>>>>>> Separate roles for >>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>> >>> >> >> >> >>> >> nodes that do "faceting" or "spell >>>>>>>>>> correction" etc.? But anyway, that is for discussion when we add >>>>>>>>>> future >>>>>>>>>> roles. This is beyond this SIP. >>>>>>>>>> >>> >> >> >> >> >>>>>>>>>> >>> >> >> >> >> >>>>>>>>>> >>> >> >> >> >> > I am not asking you to implement every possible >>>>>>>>>> role of course :). As a note I know a company that is running an >>>>>>>>>> entire >>>>>>>>>> separate >>>>>>>>>> >>> >> >> >> >> > cluster to offload and better serve >>>>>>>>>> highlighting on a subset of large docs, so YES I think there are >>>>>>>>>> people who >>>>>>>>>> may want such fine grained control. >>>>>>>>>> >>> >> >> >> >> >>>>>>>>>> >>> >> >> >> >> Cool, I think we can discuss adding any >>>>>>>>>> additional roles (for highlighting?) on a case by case basis at a >>>>>>>>>> later >>>>>>>>>> point. >>>>>>>>>> >>> >> >> >> >> >>>>>>>>>> >>> >> >> >> >> >>>>>>>>>> >>> >> >> >> >> On Mon, Nov 1, 2021 at 10:25 PM Ishan >>>>>>>>>> Chattopadhyaya <[email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>> >>> >> >> >> >>> > Boiling it down the idea I'm proposing is that >>>>>>>>>> roles required for back compatibility get explicitly added on >>>>>>>>>> startup, if >>>>>>>>>> not by the user then by the code. This is more flexible than >>>>>>>>>> assuming that >>>>>>>>>> no role means every role, because then every new feature that has a >>>>>>>>>> role >>>>>>>>>> will end up on legacy clusters which are also not back compatible. >>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>> >>> >> >> >> >>> +1, I totally agree. I even said so, when I >>>>>>>>>> said: "This is why I was advocating that 1) we assume the "data" as a >>>>>>>>>> default, 2) not assume overseer to be implicitly defined (because of >>>>>>>>>> the >>>>>>>>>> way overseer role is written today), 3) not assume any future roles >>>>>>>>>> to be >>>>>>>>>> true by default." >>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>> >>> >> >> >> >>> So, basically, I'm proposing that the "roles >>>>>>>>>> required for back compatibility" (that should be explicitly added on >>>>>>>>>> startup) be just the ["data"] role, and not the "overseer" role (due >>>>>>>>>> to the >>>>>>>>>> way overseer role is currently defined, i.e. it is "preferred >>>>>>>>>> overseer"). >>>>>>>>>> >>> >> >> >> >>> >>>>>>>>>> >>> >> >> >> >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>> >>> >> >> >> >>>> Very sorry don't mean to sound offended, >>>>>>>>>> Frustrated yes offended no :)... the most difficult thing about >>>>>>>>>> communication is the illusion it has occurred :) >>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>> >>> >> >> >> >>>> If you read back just a few emails you'll see >>>>>>>>>> where I talk about roles being applied on startup. Boiling it down >>>>>>>>>> the idea >>>>>>>>>> I'm proposing is that roles required for back compatibility get >>>>>>>>>> explicitly >>>>>>>>>> added on startup, if not by the user then by the code. This is more >>>>>>>>>> flexible than assuming that no role means every role, because then >>>>>>>>>> every >>>>>>>>>> new feature that has a role will end up on legacy clusters which are >>>>>>>>>> also >>>>>>>>>> not back compatible. >>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>> >>> >> >> >> >>>> There are points where I said all roles rather >>>>>>>>>> than back compatibility roles because I was thinking about back >>>>>>>>>> compatibility specifically, but you can't know that if I don't say >>>>>>>>>> that can >>>>>>>>>> you :). >>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>> >>> >> >> >> >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan >>>>>>>>>> Chattopadhyaya <[email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>>>> >>>>>>>>>> >>> >> >> >> >>>>> > If you read more closely, my way can provide >>>>>>>>>> full back compatibility. To say or imply it doesn't isn't helping. >>>>>>>>>> Perhaps >>>>>>>>>> you need to re-read? >>>>>>>>>> >>> >> >> >> >>>>> >>>>>>>>>> >>> >> >> >> >>>>> I understand e-mails are frustrating, and I'm >>>>>>>>>> trying my best. Please don't be offended, and kindly point me to the >>>>>>>>>> exact >>>>>>>>>> part you want me to re-read. >>>>>>>>>> >>> >> >> >> >>>>> >>>>>>>>>> >>> >> >> >> >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan >>>>>>>>>> Chattopadhyaya <[email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> > Positive - They denote the existence of >>>>>>>>>> a capability >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> Agree, the SIP already reflects this. >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> > Absolute - Absence/Presence binary >>>>>>>>>> identification of a capability; no implications, no assumptions >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> Disagree, we need backcompat handling on >>>>>>>>>> nodes running without any roles. There has to be an implicit >>>>>>>>>> assumption as >>>>>>>>>> to what roles are those nodes assumed to have. My proposal is that >>>>>>>>>> only the >>>>>>>>>> "data" role be assumed, but not the "overseer" role. For any future >>>>>>>>>> roles >>>>>>>>>> ("coordinator", "zookeeper" etc.), this decision as to what absence >>>>>>>>>> of any >>>>>>>>>> role implies should be left to the implementation of that future >>>>>>>>>> role. >>>>>>>>>> Documentation should reflect clearly about these implicit >>>>>>>>>> assumptions. >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> If you read more closely, my way can provide >>>>>>>>>> full back compatibility. To say or imply it doesn't isn't helping. >>>>>>>>>> Perhaps >>>>>>>>>> you need to re-read? >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> > Focused - Do one thing per role >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> Agree. However, I disagree with ideas where >>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to? >>>>>>>>>> Separate roles for nodes that do "faceting" or "spell correction" >>>>>>>>>> etc.? But >>>>>>>>>> anyway, that is for discussion when we add future roles. This is >>>>>>>>>> beyond >>>>>>>>>> this SIP. >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> I am not asking you to implement every >>>>>>>>>> possible role of course :). As a note I know a company that is >>>>>>>>>> running an >>>>>>>>>> entire separate cluster to offload and better serve highlighting on a >>>>>>>>>> subset of large docs, so YES I think there are people who may want >>>>>>>>>> such >>>>>>>>>> fine grained control. >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> > Accessible - It should be dead simple >>>>>>>>>> to determine the members of a role, avoid parsing blobs of json, >>>>>>>>>> avoid >>>>>>>>>> calculating implications, avoid consulting other resources after >>>>>>>>>> listing >>>>>>>>>> nodes with the role >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> Agree. I'm open to any implementation >>>>>>>>>> details that make it easy. There should be a reasonable API to >>>>>>>>>> return these >>>>>>>>>> node roles, with ability to filter by role or filter by node. >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> > Independent - One role should not >>>>>>>>>> require other roles to be present >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> Do we need to have this hard and fast >>>>>>>>>> requirement upfront? There might be situations where this is >>>>>>>>>> desirable. I >>>>>>>>>> feel we can discuss on a case by case basis whenever a future role >>>>>>>>>> is added. >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> > Persistent - roles should not be lost >>>>>>>>>> across reboot >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> Agree. >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> > Immutable - roles should not change >>>>>>>>>> while the node is running >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> Agree >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> > Lively - A node with a capability may >>>>>>>>>> not be presently providing that capability. >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> I don't understand, can you please elaborate? >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> Specifically imagine the case where there are >>>>>>>>>> 100 nodes: >>>>>>>>>> >>> >> >> >> >>>>>> 1-100 ==> DATA >>>>>>>>>> >>> >> >> >> >>>>>> 101-103 ==> OVERSEER >>>>>>>>>> >>> >> >> >> >>>>>> 104-106 ==> ZOOKEEPER >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> But you won't have 3 overseers... you'll want >>>>>>>>>> only one of those to be providing overseer functionality and the >>>>>>>>>> other two >>>>>>>>>> to be capable, but not providing (so that if the current overseer >>>>>>>>>> goes down >>>>>>>>>> a new one can be assigned). >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> Then you decide you'd ike 5 Zookeepers. You >>>>>>>>>> start nodes 107-108 with that role, but you probably want to ensure >>>>>>>>>> that >>>>>>>>>> zookeepers require some sort of command for them to actually join the >>>>>>>>>> zookeeper cluster (i.e. /admin?action=ZKADD&nodes=node107,node18) >>>>>>>>>> ... to do >>>>>>>>>> that the nodes need to be up. But oh look I typoed 108... we want >>>>>>>>>> that to >>>>>>>>>> fail... how? because 18 does not have the capability to become a >>>>>>>>>> zookeeper. >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan >>>>>>>>>> Chattopadhyaya <[email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> > Ilan: A node not having node.roles >>>>>>>>>> defined should be assumed to have all roles. Not only data. I don't >>>>>>>>>> see a >>>>>>>>>> reason to special case this one or any role. >>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: There should be no "assumptions" >>>>>>>>>> Nothing to figure out. A node has a role or not. For back >>>>>>>>>> compatibility >>>>>>>>>> reasons, all roles would be assumed on startup if none specified. >>>>>>>>>> >>> >> >> >> >>>>>>>> > Jan: No role == all roles. Explicit list >>>>>>>>>> of roles = exactly those roles. >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> Problem with this approach is mainly to do >>>>>>>>>> with backcompat. >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> 1. Overseer backcompat: >>>>>>>>>> >>> >> >> >> >>>>>>>> If we don't make any modifications to how >>>>>>>>>> overseer works and adopt this approach (as quoted), then imagine this >>>>>>>>>> situation: >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> Solr1-100: No roles param (assumed to be >>>>>>>>>> "data,overseer"). >>>>>>>>>> >>> >> >> >> >>>>>>>> Solr101: -Dnode.roles=overseer (intention: >>>>>>>>>> dedicated overseer) >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> User wants this node Solr101 to be a >>>>>>>>>> dedicated overseer, but for that to happen, he/she would need to >>>>>>>>>> restart >>>>>>>>>> all the data nodes with -Dnode.roles=data. This will cause >>>>>>>>>> unnecessary >>>>>>>>>> disruption to running clusters where a dedicated overseer is needed. >>>>>>>>>> Keep >>>>>>>>>> in mind, if a user needs a dedicated overseer, he's likely in an >>>>>>>>>> emergency >>>>>>>>>> situation and restarting the whole cluster might not be viable for >>>>>>>>>> him/her. >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> 2. Future roles might not be compatible >>>>>>>>>> with this "assumed to have all roles" idea: >>>>>>>>>> >>> >> >> >> >>>>>>>> Take the proposed "zookeeper" role for >>>>>>>>>> example. Today, regular nodes are not supposed to have embedded ZK >>>>>>>>>> running >>>>>>>>>> on them. By introducing this artificial limitation ("assumed to have >>>>>>>>>> all >>>>>>>>>> roles"), we constrain adoption of all future roles to necessarily >>>>>>>>>> require a >>>>>>>>>> full cluster restart. >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> Keep in mind newer Solr versions can >>>>>>>>>> introduce new capabilities and roles. Imagine we have a role that is >>>>>>>>>> defined in a new Solr version (and there's functionality to go with >>>>>>>>>> that >>>>>>>>>> role), and user upgrades to that version. However, his/her nodes all >>>>>>>>>> were >>>>>>>>>> started with no node.roles param. Hence, if those nodes are "assumed >>>>>>>>>> to >>>>>>>>>> have all roles", then just by virtue of upgrading to this new >>>>>>>>>> version, new >>>>>>>>>> capabilities will be turned on for the entire cluster, whether or >>>>>>>>>> not the >>>>>>>>>> user opted for such a capability. This is totally undesirable. >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: I actually don't want a coordinator >>>>>>>>>> to do more work, I would prefer small focused roles with names that >>>>>>>>>> accurately describe their function. In that light, COORDINATOR might >>>>>>>>>> be too >>>>>>>>>> nebulous. How about AGREGATOR role? (what I was thinking of would >>>>>>>>>> better be >>>>>>>>>> called a QUERY_ANALYSIS role) >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> If you want to do specific things like >>>>>>>>>> query analysis or query aggregation or bulk indexing etc, all of >>>>>>>>>> those can >>>>>>>>>> be done on COORDINATOR nodes (as is the case in ElasticSearch). >>>>>>>>>> Having tens >>>>>>>>>> of of " small focused roles" defined as first class concepts would be >>>>>>>>>> confusing to the user. As a remedy to your situation where you want >>>>>>>>>> the >>>>>>>>>> coordinator role to also do query-analysis for shards, one possible >>>>>>>>>> solution is to send such a query to a coordinator node with a >>>>>>>>>> parameter >>>>>>>>>> like "coordinator.query_analysis=true", and then the coordinator, >>>>>>>>>> instead >>>>>>>>>> of blindly hitting remote shards, also does some extra work on >>>>>>>>>> behalf of >>>>>>>>>> the shards. >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan >>>>>>>>>> Chattopadhyaya <[email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> > If we make collections role-aware for >>>>>>>>>> example (replicas of that collection can only be >>>>>>>>>> >>> >> >> >> >>>>>>>>> > placed on nodes with a specific role, in >>>>>>>>>> addition to the other role based constraints), >>>>>>>>>> >>> >> >> >> >>>>>>>>> > the set of roles should be user >>>>>>>>>> extensible and not fixed. >>>>>>>>>> >>> >> >> >> >>>>>>>>> > If collections are not role aware, the >>>>>>>>>> constraints introduced by roles apply to all collections >>>>>>>>>> >>> >> >> >> >>>>>>>>> > equally which might be insufficient if a >>>>>>>>>> user needs for example a heavily used collection to >>>>>>>>>> >>> >> >> >> >>>>>>>>> > only be placed on more powerful nodes. >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> I feel node roles and role-aware >>>>>>>>>> collections are orthogonal topics. What you describe above can be >>>>>>>>>> achieved >>>>>>>>>> by the autoscaling+replica placement framework where the placement >>>>>>>>>> plugins >>>>>>>>>> take the node roles as one of the inputs. >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> > It does impact the design from early on: >>>>>>>>>> the set of roles need to be expandable by a user >>>>>>>>>> >>> >> >> >> >>>>>>>>> > by creating a collection with new roles >>>>>>>>>> for example (consumed by placement plugins) and be >>>>>>>>>> >>> >> >> >> >>>>>>>>> > able to start nodes with new (arbitrary) >>>>>>>>>> roles. Should such roles follow some naming syntax to >>>>>>>>>> >>> >> >> >> >>>>>>>>> > differentiate them from built in roles? >>>>>>>>>> To be able to fail on typos on roles - that otherwise can be >>>>>>>>>> >>> >> >> >> >>>>>>>>> > crippling and hard to debug. This >>>>>>>>>> implies in any case that the current design can't assume all >>>>>>>>>> >>> >> >> >> >>>>>>>>> > roles are known at compile time or >>>>>>>>>> define them in a Java enum. >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> I think this should be achieved by >>>>>>>>>> something different from roles. Something like node labels (user >>>>>>>>>> defined) >>>>>>>>>> which can then be used in a replica placement plugin to assign >>>>>>>>>> replicas. I >>>>>>>>>> see roles as more closely associated with kinds of functionality a >>>>>>>>>> node is >>>>>>>>>> designated for. Therefore, I feel that replica placements and user >>>>>>>>>> defined >>>>>>>>>> node labels is out of scope for this SIP. It can be added later in a >>>>>>>>>> separate SIP, without being at odds with this proposal. >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl >>>>>>>>>> <[email protected]> wrote: >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan >>>>>>>>>> Ginzburg <[email protected]>: >>>>>>>>>> >>> >> >> >> >>>>>>>>>> > A node not having node.roles defined >>>>>>>>>> should be assumed to have all roles. Not only data. I don't see a >>>>>>>>>> reason to >>>>>>>>>> special case this one or any role. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> +1, make it simple and transparent. No >>>>>>>>>> role == all roles. Explicit list of roles = exactly those roles. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> > (Gus) See my comment above, but maybe >>>>>>>>>> preference is something handled as a feature of the role rather than >>>>>>>>>> via >>>>>>>>>> role designation? >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> Yea, we always need an overseer, so that >>>>>>>>>> feature can decide to use its list of nodes as a preference if it so >>>>>>>>>> chooses. >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> Aside: I think it makes it easier if we >>>>>>>>>> always prefix Solr env.vars and sys.props with "SOLR_" or "solr.", >>>>>>>>>> i.e. >>>>>>>>>> -Dsolr.node.roles=foo. That way we can get away from having to have >>>>>>>>>> explicit code in bin/solr, bin/solr.cmd and SolrCLI to manage every >>>>>>>>>> single >>>>>>>>>> property. Instead we can parse all ENVs and Props with the solr >>>>>>>>>> prefix in >>>>>>>>>> our bootstrap code. And we can by convention allow e.g. docker run -e >>>>>>>>>> SOLR_NODE_ROLES=foo solr:9 and it would be the same ting... >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> Jan >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> >>> >> >> >> >>>>>>>>>> To unsubscribe, e-mail: >>>>>>>>>> [email protected] >>>>>>>>>> >>> >> >> >> >>>>>>>>>> For additional commands, e-mail: >>>>>>>>>> [email protected] >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> >>>>>>>>>> >>> >> >> >> >>>>>> -- >>>>>>>>>> >>> >> >> >> >>>>>> http://www.needhamsoftware.com (work) >>>>>>>>>> >>> >> >> >> >>>>>> http://www.the111shift.com (play) >>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>> >>> >> >> >> >>>> >>>>>>>>>> >>> >> >> >> >>>> -- >>>>>>>>>> >>> >> >> >> >>>> http://www.needhamsoftware.com (work) >>>>>>>>>> >>> >> >> >> >>>> http://www.the111shift.com (play) >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> >> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> >>> >> >> >> To unsubscribe, e-mail: >>>>>>>>>> [email protected] >>>>>>>>>> >>> >> >> >> For additional commands, e-mail: >>>>>>>>>> [email protected] >>>>>>>>>> >>> >> >> >> >>>>>>>>>> >>> >> >> > >>>>>>>>>> >>> >> >> > >>>>>>>>>> >>> >> >> > -- >>>>>>>>>> >>> >> >> > http://www.needhamsoftware.com (work) >>>>>>>>>> >>> >> >> > http://www.the111shift.com (play) >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> >>> >> >> To unsubscribe, e-mail: [email protected] >>>>>>>>>> >>> >> >> For additional commands, e-mail: >>>>>>>>>> [email protected] >>>>>>>>>> >>> >> >> >>>>>>>>>> >>> >> >>>>>>>>>> >>> >> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> >>> >> To unsubscribe, e-mail: [email protected] >>>>>>>>>> >>> >> For additional commands, e-mail: [email protected] >>>>>>>>>> >>> >> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> >>> To unsubscribe, e-mail: [email protected] >>>>>>>>>> >>> For additional commands, e-mail: [email protected] >>>>>>>>>> >>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>>>> >>>>>>>>>> >>>>> >>>>> -- >>>>> ----------------------------------------------------- >>>>> Noble Paul >>>>> >>>>> >>>>>
