Re: First class support for node roles

Ilan Ginzburg Thu, 04 Nov 2021 08:01:26 -0700

I was noting that the real value of the proposal (real value = being able
to do things that are currently impossible with Solr) was due to an
independent concept of a coordinator "core", and that if we had this
(currently does not exist in Solr but apparently you do have it on a fork),
we can achieve most/all of what the SIP proposes with existing means, i.e.
without roles. Maybe in a less flexible/user friendly way, maybe not (given
the details of rolling out roles are still fuzzy).
And if we don't have the concept of coordinator core, then the roles by
themselves do not allow much more than what is already achievable by other
means.


Ilan

On Thu, Nov 4, 2021 at 12:02 PM Noble Paul <[email protected]> wrote:

> The placement part of roles feature may use placement plugin API .
>
>
>  The implementation is not what we're discussing here. We need a
> consistent story for the user when it comes to roles. This discussion is
> about the UX rather than the impl.
>
> Most of our discussions are about how we should implement it
>
>
>
> On Thu, Nov 4, 2021, 9:27 PM Ilan Ginzburg <[email protected]> wrote:
>
>> A lot of the value of this SIP relies on the pseudo-core thing (because
>> placing on specific nodes is achievable today, Overseer role already
>> exists). Roles as described without the coordinator concept are just
>> another way to do things already possible today (with a very minor update
>> on the Affinity placement plugin - it might even support it right away
>> actually, didn't check).
>> Maybe "pseudo core" should go in first and condition the rest of the
>> work? It feels like a bigger chunk with more challenging integration issues
>> (routing, new concept in the collection/shard/replica hierarchy).
>>
>> Ilan
>>
>> On Thu, Nov 4, 2021 at 11:20 AM Noble Paul <[email protected]> wrote:
>>
>>> None of the design is dictated by the version in which we implement
>>> this. The SIP is mostly about the "what", "why" and the UX
>>>
>>> I don't have any affinity to any particular version. This is definitely
>>> going to happen in 9.x. Even if it is built in 9.x we will have to build
>>> and support all versions of solr we use internally. When we eventually
>>> upgrade from our current version to a 9.x version , it has to be backward
>>> compatible.The choice of whether this is available for public consumption
>>> as a branch/release is up for debate
>>>
>>> On Thu, Nov 4, 2021, 8:28 PM Jan Høydahl <[email protected]> wrote:
>>>
>>>> Let's do ourself a service and target 9.0 for roles. It's too late to
>>>> plan new features into 8.x.
>>>>
>>>> I don't understand the urgency either. I can get that certain Solr
>>>> users would wish for such a feature "yesterday" but that cannot drive our
>>>> decisions on what version to target for features. When targeting 9.0, all
>>>> upgrade or back-compat worries will need to be baked into the feature
>>>> itself, so that there is either code support or good documentation for how
>>>> to start using roles after upgrading a cluster to 9.0. Perhaps there must
>>>> be a temporary cluster-property in 9.0 "enableRoles=false" that can be set,
>>>> even if all 9.0 nodes are given roles on startup. Then, initially after the
>>>> upgrade, the cluster behaves as it did in 8.x. Then once you are ready to
>>>> enforce roles, you can flip the cluster property, and placement and routing
>>>> starts using roles. In 10.0 that property can then go away.
>>>>
>>>> When it comes to placement plugins, we can document in that they MUST
>>>> respect certain node roles (at least the data role), and treat it as a bug
>>>> if they don't.
>>>>
>>>> Jan
>>>>
>>>> 4. nov. 2021 kl. 03:36 skrev Noble Paul <[email protected]>:
>>>>
>>>> Thanks everyone for participating in the discussion. I have gone
>>>> through all your valuable inputs and these are my suggestions
>>>>
>>>> Requirements?
>>>>
>>>>    1. Users should be able to designate a node with some role by
>>>>    starting (say -Dnode.roles=coordinator)
>>>>    2. This node should be able to perform a certain behavior
>>>>    3. Replica placement should be aware of this and may choose to
>>>>    place or not place a replica in this node
>>>>    4. Any client should be able to query any node in the cluster to
>>>>    get a list of nodes with a specified role or get the roles of a given 
>>>> node
>>>>
>>>>
>>>> Implementation?
>>>> Here is how we could implement each of the requirements:
>>>>
>>>>    1. We could theoretically use a well known system property and
>>>>    2. The actual behavior will have to be implemented in both 8.x or
>>>>    9.x
>>>>    3. Placement of replicas
>>>>    1. It’s not possible to do this in 8.x
>>>>       2. In 9.x, replica placement plugin can be internally used to
>>>>       ensure proper placement of replicas in the roles feature.
>>>>
>>>>       1. It can’t be done with the current design as users cannot
>>>>          chain multiple placement plugins or user has to build a custom 
>>>> placement
>>>>          plugin of his own
>>>>          2. There is no standard UX to achieve this. It will be a
>>>>          recipe (start nodes with this property and create these rules 
>>>> etc, etc).
>>>>          This is awkward & error prone, as compared to saying “start a 
>>>> node with
>>>>          coordinator role” and Solr will take care of it.
>>>>          4. There will be a new API endpoint to publish this
>>>>    information in 8.x and 9.x. This end point is important to make this
>>>>    feature usable
>>>>
>>>>
>>>> Conclusion
>>>>
>>>>    1. With a roles feature, we can achieve the objectives in a user
>>>>    friendly and intuitive way
>>>>    2. The user interface can be consistent across 8.x and 9.x even
>>>>    though 9.x can use the placement plugin internally
>>>>    3. The actual roles definition will be same across 8.x and 9.x
>>>>
>>>>
>>>>
>>>> On Thu, Nov 4, 2021 at 6:32 AM Noble Paul <[email protected]> wrote:
>>>>
>>>>> Michael
>>>>>
>>>>> We explored all options to before arriving at this solution. Ishan has
>>>>> already explained why Tim's suggestions have their shortcomings when it
>>>>> comes to user experience.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 4, 2021, 3:51 AM Michael Gibney <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> >I actually didn't realize that an empty Solr node would forward the
>>>>>> top-level
>>>>>> >request onward instead of just being the query controller itself?
>>>>>> That
>>>>>> >actually seems like a bug vs. a feature, IMO any node that receives
>>>>>> >the top-level query should just be the coordinator, what stops it?
>>>>>>
>>>>>> +1 to Tim's statement quoted above; unless I'm missing something,
>>>>>> this feels like an issue that should be addressed regardless of this SIP.
>>>>>> (perhaps it would be addressed incidentally by this SIP? -- in any event
>>>>>> the current situation seems to not make sense. As Tim points out, the
>>>>>> relevant configs should in principle be accessible from ZK whether or not
>>>>>> there's a core for a given collection on a given node).
>>>>>>
>>>>>> Considering the above, and especially given Ishan that you say "The
>>>>>> coordinator role is the biggest motivation for introducing the concept of
>>>>>> roles", while reading the SIP I found myself wishing for a fuller
>>>>>> enumeration of use cases, and a more sympathetic characterization of
>>>>>> alternatives (existing alternatives, and perhaps, as with the above 
>>>>>> "proxy
>>>>>> request" issue, simpler-but-not-yet-implemented alternatives).
>>>>>>
>>>>>> Combining questions about use cases with questions about
>>>>>> alternatives: assuming that 9.x autoscaling can indeed be reliably used 
>>>>>> to
>>>>>> stop replicas from being placed on nodes, how close would addressing the
>>>>>> orthogonal "proxy request" issue come to addressing potential use cases?
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 3, 2021 at 10:00 AM Ilan Ginzburg <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I think if we have the new "pseudo core" abstraction (I like it!
>>>>>>> Will it really be a core with an index on disk or some new abstraction 
>>>>>>> only
>>>>>>> tracked in ZK and in memory?) to play the role of coordinator, then we 
>>>>>>> have
>>>>>>> all we need with the affinity placement plugin framework for a data free
>>>>>>> coordinator node implementation.
>>>>>>> It is easy to use system properties to exclude nodes from
>>>>>>> receiving replicas using the placement plugins, a minor change in the
>>>>>>> Affinity Placement Plugin. Such nodes will not receive any replicas by 
>>>>>>> the
>>>>>>> placement plugin not even at startup (the system property will be 
>>>>>>> assigned
>>>>>>> at startup so no manual intervention needed).
>>>>>>>
>>>>>>> It will not work if switching to another placement plugin, unless
>>>>>>> that other plugin reimplements that (simple) aspect. Is that an issue?
>>>>>>>
>>>>>>> Ilan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 3, 2021 at 2:57 AM Ishan Chattopadhyaya <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Answers inline below.
>>>>>>>>
>>>>>>>> On Wed, Nov 3, 2021 at 5:56 AM Timothy Potter <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> One last thought on this for me ... I think it would be beneficial
>>>>>>>>> for
>>>>>>>>> the SIP to address how this new feature will work with the existing
>>>>>>>>> shards.preference solution and affinity based placement plugin.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I was more inclined to keep this SIP focused on broad concept of
>>>>>>>> roles, and any upcoming roles (coordinator role, along with that
>>>>>>>> pseudo-core functionality) to be described in their own issue (e.g.
>>>>>>>> SOLR-15715).
>>>>>>>>
>>>>>>>>
>>>>>>>>> Moreover, your pseudo-replica solution sounds like a new replica
>>>>>>>>> type
>>>>>>>>> vs. a node level thing.
>>>>>>>>
>>>>>>>>
>>>>>>>> I misspoke when I called it "pseudo replica", it is actually a
>>>>>>>> "pseudo core". Replicas are shard level concepts, but such a pseudo 
>>>>>>>> core
>>>>>>>> that we plan to introduce will pertain to one or more collections. 
>>>>>>>> Imagine
>>>>>>>> collection1 has shard1 and shard2, there will be a single pseudo core 
>>>>>>>> for
>>>>>>>> collection1 (we haven't decided on the prefix of this pseudo core yet, 
>>>>>>>> but
>>>>>>>> a candidate can be ".collection1_coordinator"). Replica type won't fit 
>>>>>>>> this
>>>>>>>> mental model here. We can discuss this more in the SOLR-15715 issue.
>>>>>>>>
>>>>>>>> The placement strategy can place replicas
>>>>>>>>> based on replica type and node type (just a system property), so
>>>>>>>>> please address why you can't achieve a query coordinator behavior
>>>>>>>>> with
>>>>>>>>> a new replica type + improvements to the Affinity placement plugin?
>>>>>>>>>
>>>>>>>>
>>>>>>>> To put down my thoughts on why Affinity placement plugin won't work
>>>>>>>> for the purpose of ensuring that we have nodes that host no data on it:
>>>>>>>> 1. We want the ability to have nodes with no data on it as a first
>>>>>>>> class concept for users. Hence, if the Affinity placement plugin is 
>>>>>>>> used
>>>>>>>> for that purpose, users won't be able to switch out that plugin and use
>>>>>>>> anything of their own. Currently, IIUC, there's not way for users to 
>>>>>>>> use
>>>>>>>> multiple placement plugins.
>>>>>>>> 2. Nodes that shouldn't host any replica on it are generally
>>>>>>>> ephemeral in nature; many of them may join the cluster, they may go 
>>>>>>>> away.
>>>>>>>> If such a node joins the cluster, they immediately become eligible for
>>>>>>>> replica placement, before even the sysadmin is able to assign an 
>>>>>>>> affinity
>>>>>>>> placement configuration for that node. This is a problem.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Tim
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for your thoughts and feedback, I think it will help us put
>>>>>>>> together the document with more insights into our design choices.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ishan
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Nov 2, 2021 at 6:14 PM Ishan Chattopadhyaya
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>> >
>>>>>>>>> > Also, in a cluster where new collections/shards/replicas are
>>>>>>>>> continuously added all the time, it would be pretty awkward to start 
>>>>>>>>> a node
>>>>>>>>> (in regular mode), briefly have it become eligible for replica 
>>>>>>>>> assignment,
>>>>>>>>> then invoking a replica placement rule/autoscaling policy for that 
>>>>>>>>> node to
>>>>>>>>> not place replicas on it. Instead, starting a node with a defined 
>>>>>>>>> role (as
>>>>>>>>> a startup param) precludes that brief period of eligibility for 
>>>>>>>>> replica
>>>>>>>>> placement on such a node.
>>>>>>>>> >
>>>>>>>>> > On Wed, Nov 3, 2021 at 5:39 AM Ishan Chattopadhyaya <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>
>>>>>>>>> >> If we were to tell users how to do "scatter gather on an empty
>>>>>>>>> node", *how exactly* would you recommend users have an empty node to 
>>>>>>>>> begin
>>>>>>>>> with? Wouldn't you say something like "for 8x you can do this (rule 
>>>>>>>>> based
>>>>>>>>> replica placement) or do that (autoscaling), but for 9x you do this 
>>>>>>>>> new
>>>>>>>>> thing". Having a node that doesn't have a data role seems like a 
>>>>>>>>> consistent
>>>>>>>>> and an elegant way for users to invoke such a functionality and also 
>>>>>>>>> easily
>>>>>>>>> relate to a broad concept, without having to deal with autoscaling
>>>>>>>>> frameworks of the ancient past, medieval past or the future.
>>>>>>>>> >>
>>>>>>>>> >> On Wed, Nov 3, 2021 at 5:29 AM Timothy Potter <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> As opposed to what? Looking up the configset for the addressed
>>>>>>>>> >>> collection and pulling whatever information it needs from
>>>>>>>>> cached data.
>>>>>>>>> >>> I'm sure there are some nuances but I hardly think you need a
>>>>>>>>> node
>>>>>>>>> >>> role framework to deal with determine the unique key field to
>>>>>>>>> do
>>>>>>>>> >>> scatter gather on an empty node when you have easy access to
>>>>>>>>> >>> collection metadata.
>>>>>>>>> >>>
>>>>>>>>> >>> Doesn't seem like a hard thing to overcome to me.
>>>>>>>>> >>>
>>>>>>>>> >>> On Tue, Nov 2, 2021 at 5:49 PM Noble Paul <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>> >
>>>>>>>>> >>> >
>>>>>>>>> >>> >
>>>>>>>>> >>> > On Wed, Nov 3, 2021, 10:46 AM Timothy Potter <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>> >>
>>>>>>>>> >>> >> I'm not missing the point of the query coordinator, but I
>>>>>>>>> actually
>>>>>>>>> >>> >> didn't realize that an empty Solr node would forward the
>>>>>>>>> top-level
>>>>>>>>> >>> >> request onward instead of just being the query controller
>>>>>>>>> itself? That
>>>>>>>>> >>> >> actually seems like a bug vs. a feature, IMO any node that
>>>>>>>>> receives
>>>>>>>>> >>> >> the top-level query should just be the coordinator, what
>>>>>>>>> stops it?
>>>>>>>>> >>> >
>>>>>>>>> >>> >
>>>>>>>>> >>> > To process a request there should be a core that uses the
>>>>>>>>> same configset as the requested collection.
>>>>>>>>> >>> >>
>>>>>>>>> >>> >>
>>>>>>>>> >>> >> Anyway, it sounds to me like you guys have your minds made
>>>>>>>>> up
>>>>>>>>> >>> >> regardless of feedback.
>>>>>>>>> >>> >>
>>>>>>>>> >>> >> Btw ~ I only mentioned the Zookeeper part b/c it's in your
>>>>>>>>> SIP as a
>>>>>>>>> >>> >> specific role, not sure why you took that as me wanting to
>>>>>>>>> discuss the
>>>>>>>>> >>> >> embedded ZK in your SIP?
>>>>>>>>> >>> >>
>>>>>>>>> >>> >> On Tue, Nov 2, 2021 at 5:13 PM Ishan Chattopadhyaya
>>>>>>>>> >>> >> <[email protected]> wrote:
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > Hi Tim,
>>>>>>>>> >>> >> > Here are my responses inline.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > On Wed, Nov 3, 2021 at 3:22 AM Timothy Potter <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> I'm just not convinced this feature is even needed and
>>>>>>>>> the SIP is not
>>>>>>>>> >>> >> >> convincing that "There is no proper alternative today."
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > There are no proper alternatives today, just hacks. On
>>>>>>>>> 8x, we have two different deprecated frameworks to stop nodes from 
>>>>>>>>> being
>>>>>>>>> placed on a node (1. rule based replica placement, 2. autoscaling
>>>>>>>>> framework). On 9x, we have a new autoscaling framework, which I don't 
>>>>>>>>> even
>>>>>>>>> think is fully implemented. And, there's definitely no way to have a 
>>>>>>>>> node
>>>>>>>>> act as a query coordinator without having data on it.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> 1) Just b/c Elastic and Vespa have a concept of node
>>>>>>>>> roles, doesn't
>>>>>>>>> >>> >> >> mean Solr needs this.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > Solr needs this. Elastic has such concepts is a
>>>>>>>>> coincidence, and also means we have an opportunity to catch up with 
>>>>>>>>> them;
>>>>>>>>> they have these concepts for a reason.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> Also, some of Elastic's roles overlap with
>>>>>>>>> >>> >> >> concepts Solr already has in a different form, i.e
>>>>>>>>> data_hot sounds
>>>>>>>>> >>> >> >> like NRT and data_warm sounds a lot like our Pull
>>>>>>>>> Replica Type
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > I think that is beyond the scope of this SIP.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> 2) You can achieve the "coordinator" role with
>>>>>>>>> auto-scaling rules
>>>>>>>>> >>> >> >> pre-9.x and with the AffinityPlacementPlugin (heck, it
>>>>>>>>> even has a node
>>>>>>>>> >>> >> >> type built in:
>>>>>>>>> .requestNodeSystemProperty(AffinityPlacementConfig.NODE_TYPE_SYSPROP).
>>>>>>>>> >>> >> >> Simply build your replica placement rules such that no
>>>>>>>>> replicas land
>>>>>>>>> >>> >> >> on "coordinator" nodes. And you can route queries using
>>>>>>>>> node.sysprop
>>>>>>>>> >>> >> >> already using shards.preference.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > I think you missed the whole point of the query
>>>>>>>>> coordinator. Please refer to this
>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715.
>>>>>>>>> >>> >> > Let me summarize the main difference between what (I
>>>>>>>>> think) you refer to and what is proposed in SOLR-15715.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > With your suggestion, we'll have a node that doesn't host
>>>>>>>>> any replicas. And you suggest queries landing on such nodes be routed 
>>>>>>>>> using
>>>>>>>>> shards.preference? Well, in such a case, these queries will be
>>>>>>>>> forwarded/proxied to a random node hosting a replica of the 
>>>>>>>>> collection and
>>>>>>>>> that node then acts as the coordinator. This situation is no better 
>>>>>>>>> than
>>>>>>>>> sending the query directly to that particular node.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > What is proposed in SOLR-15715 is a query aggregation
>>>>>>>>> functionality. There will be pseudo replicas (aware of the configset) 
>>>>>>>>> on
>>>>>>>>> this coordinator node that handle the request themselves, sends shard
>>>>>>>>> requests to data hosting replicas, collects responses and merges 
>>>>>>>>> them, and
>>>>>>>>> sends back to the user. This merge step is usually extremely memory
>>>>>>>>> intensive, and it would be good to serve these off stateless nodes 
>>>>>>>>> (that
>>>>>>>>> host no data).
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> 3) Dedicated overseer role? I thought we were removing
>>>>>>>>> the overseer?!?
>>>>>>>>> >>> >> >> Also, we already have the ability to run the overseer on
>>>>>>>>> specific
>>>>>>>>> >>> >> >> nodes w/o a new framework, so this doesn't really
>>>>>>>>> convince me we need
>>>>>>>>> >>> >> >> a new framework.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > There's absolutely no change proposed to the "overseer"
>>>>>>>>> role. What users need on production clusters are nodes dedicated for
>>>>>>>>> overseer operations, and for that the current "overseer" role 
>>>>>>>>> suffices,
>>>>>>>>> together with some functionality to not place replicas on such nodes.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> 4) We will indeed need to decide which nodes host
>>>>>>>>> embedded Zookeeper's
>>>>>>>>> >>> >> >> but I'd argue that solution hasn't been designed
>>>>>>>>> entirely and we
>>>>>>>>> >>> >> >> probably don't need a formal node role framework to
>>>>>>>>> determine which
>>>>>>>>> >>> >> >> nodes host embedded ZKs. Moreover, embedded ZK seems
>>>>>>>>> more like a small
>>>>>>>>> >>> >> >> cluster thing and anyone running a large cluster will
>>>>>>>>> probably have a
>>>>>>>>> >>> >> >> dedicated ZK ensemble as they do today. The node role
>>>>>>>>> thing seems like
>>>>>>>>> >>> >> >> it's intended for large clusters and my gut says few
>>>>>>>>> will use embedded
>>>>>>>>> >>> >> >> ZK for large clusters.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > This SIP is not the right place for this discussion.
>>>>>>>>> There's a separate SIP for this.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> 5) You can also achieve a lot of "node role"
>>>>>>>>> functionality in query
>>>>>>>>> >>> >> >> routing using the shards.preference parameter.
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > That doesn't solve the purpose behind
>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> At the very least, the SIP needs to list specific use
>>>>>>>>> cases that
>>>>>>>>> >>> >> >> require this feature that are not achievable with the
>>>>>>>>> current features
>>>>>>>>> >>> >> >> before getting bogged down in the impl. details.
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> > The coordinator role is the biggest motivation for
>>>>>>>>> introducing the concept of roles. However, in addition to what is 
>>>>>>>>> proposed
>>>>>>>>> in SOLR-15715, a coordinator node can later on also be used as a node 
>>>>>>>>> for
>>>>>>>>> users to run streaming expressions on, do bulk indexing on (impl 
>>>>>>>>> details
>>>>>>>>> for this to come later, don't want distraction here).
>>>>>>>>> >>> >> >
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> Tim
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >> On Tue, Nov 2, 2021 at 3:20 PM Gus Heck <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>> >> >> >
>>>>>>>>> >>> >> >> > I think there are things not yet accounted for. Time I
>>>>>>>>> spent yesterday is biting me today. Pls give a couple days.
>>>>>>>>> >>> >> >> >
>>>>>>>>> >>> >> >> > On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> Hey Ishan,
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> I appreciate you writing up the SIP!  Here's some
>>>>>>>>> notes/questions I
>>>>>>>>> >>> >> >> >> had as I was reading through your writeup and this
>>>>>>>>> mail thread.
>>>>>>>>> >>> >> >> >> ("----" separators between thoughts, hopefully that
>>>>>>>>> helps.)
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> ----
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> I'll add my vote to what Jan, Gus, Ilan, and Houston
>>>>>>>>> already
>>>>>>>>> >>> >> >> >> suggested: roles should default to "all-on".  I see
>>>>>>>>> the downsides
>>>>>>>>> >>> >> >> >> you're worried about with that approach (esp. around
>>>>>>>>> 'overseer'), but
>>>>>>>>> >>> >> >> >> they may be mitigatable, at least in part.
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> > [mail thread] User wants this node Solr101 to be a
>>>>>>>>> dedicated overseer, but for that to happen, he/she would need to 
>>>>>>>>> restart
>>>>>>>>> all the data nodes with -Dnode.roles=data
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> Sure, if roles can only be specified at startup.  But
>>>>>>>>> that may be a
>>>>>>>>> >>> >> >> >> self-imposed constraint.
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> An API to change a node's roles would remove the need
>>>>>>>>> for a restart
>>>>>>>>> >>> >> >> >> and make it easy for users to affect the semantics
>>>>>>>>> they want.  You
>>>>>>>>> >>> >> >> >> decided you want a dedicated overseer N nodes into
>>>>>>>>> your cluster
>>>>>>>>> >>> >> >> >> deployment?  Deploy node 'N' with the 'overseer', and
>>>>>>>>> toggle the
>>>>>>>>> >>> >> >> >> overseer role off on the remainder.
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> Now, I understand that you don't want roles to change
>>>>>>>>> at runtime, but
>>>>>>>>> >>> >> >> >> I haven't seen you get much into "why", beyond saying
>>>>>>>>> "it is very
>>>>>>>>> >>> >> >> >> risky to have nodes change roles while they are up
>>>>>>>>> and running."  Can
>>>>>>>>> >>> >> >> >> you expand a bit on the risks you're worried about?
>>>>>>>>> If you're
>>>>>>>>> >>> >> >> >> explicit about them here maybe someone can think of a
>>>>>>>>> clever way to
>>>>>>>>> >>> >> >> >> address them?
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> > Hence, if those nodes are "assumed to have all
>>>>>>>>> roles", then just by virtue of upgrading to this new version, new
>>>>>>>>> capabilities will be turned on for the entire cluster, whether or not 
>>>>>>>>> the
>>>>>>>>> user opted for such a capability. This is totally undesirable.
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> Obviously "roles" refer to much bigger chunks of
>>>>>>>>> functionality than
>>>>>>>>> >>> >> >> >> usual, so in a sense defaulting roles on is scarier.
>>>>>>>>> But in a sense
>>>>>>>>> >>> >> >> >> you're describing something that's an inherent part
>>>>>>>>> of software
>>>>>>>>> >>> >> >> >> releases.  Releases expose new features that are
>>>>>>>>> typically on by
>>>>>>>>> >>> >> >> >> default.  A new default-on role in 9.1 might hurt a
>>>>>>>>> user, but there's
>>>>>>>>> >>> >> >> >> no fundamental difference between that and a change
>>>>>>>>> to backups or
>>>>>>>>> >>> >> >> >> replication or whatever in the same release.
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> I don't mean to belittle the difference in scope - I
>>>>>>>>> get your concern.
>>>>>>>>> >>> >> >> >> But IMO this is something to address with good
>>>>>>>>> release notes and
>>>>>>>>> >>> >> >> >> documentation.  Designing for admins who don't do
>>>>>>>>> even cursory
>>>>>>>>> >>> >> >> >> research before an upgrade ties both our hands behind
>>>>>>>>> our back as a
>>>>>>>>> >>> >> >> >> project.
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> ----
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> > [SIP] Internal representation in ZK ...
>>>>>>>>> Implementation details like these can be fleshed out in the PR
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> IMO this is important enough to flush out as part of
>>>>>>>>> the SIP, at least
>>>>>>>>> >>> >> >> >> in broad strokes.  It affects backcompat, SolrJ
>>>>>>>>> client design, etc.
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> ----
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> > [SIP] GET /api/cluster/roles?node=node1
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> Woohoo - way to include a v2 API definition!
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> AFAIR, the v2 API has a /nodes path defined - I
>>>>>>>>> wonder whether "GET
>>>>>>>>> >>> >> >> >> /nodes/someNode/roles" wouldn't be a more intuitive
>>>>>>>>> endpoint for the
>>>>>>>>> >>> >> >> >> "get the roles this node has" functionality.  Though
>>>>>>>>> I leave that for
>>>>>>>>> >>> >> >> >> your consideration.
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> ----
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> Looking forward to your responses and seeing the SIP
>>>>>>>>> progress!  It's a
>>>>>>>>> >>> >> >> >> really cool, promising idea IMO.
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> Best,
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> Jason
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >> On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya
>>>>>>>>> >>> >> >> >> <[email protected]> wrote:
>>>>>>>>> >>> >> >> >> >
>>>>>>>>> >>> >> >> >> > Are there any unaddressed outstanding concerns that
>>>>>>>>> we should hold up the SIP for?
>>>>>>>>> >>> >> >> >> >
>>>>>>>>> >>> >> >> >> > On Mon, 1 Nov, 2021, 10:31 pm Ishan Chattopadhyaya,
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>> >>> >> >> >> >>> >> Agree. However, I disagree with ideas where
>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to?
>>>>>>>>> Separate roles for
>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>> >>> >> >> >> >>> >> nodes that do "faceting" or "spell correction"
>>>>>>>>> etc.? But anyway, that is for discussion when we add future roles. 
>>>>>>>>> This is
>>>>>>>>> beyond this SIP.
>>>>>>>>> >>> >> >> >> >>
>>>>>>>>> >>> >> >> >> >>
>>>>>>>>> >>> >> >> >> >> > I am not asking you to implement every possible
>>>>>>>>> role of course :). As a note I know a company that is running an 
>>>>>>>>> entire
>>>>>>>>> separate
>>>>>>>>> >>> >> >> >> >> > cluster to offload and better serve highlighting
>>>>>>>>> on a subset of large docs, so YES I think there are people who may 
>>>>>>>>> want
>>>>>>>>> such fine grained control.
>>>>>>>>> >>> >> >> >> >>
>>>>>>>>> >>> >> >> >> >> Cool, I think we can discuss adding any additional
>>>>>>>>> roles (for highlighting?) on a case by case basis at a later point.
>>>>>>>>> >>> >> >> >> >>
>>>>>>>>> >>> >> >> >> >>
>>>>>>>>> >>> >> >> >> >> On Mon, Nov 1, 2021 at 10:25 PM Ishan
>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>> >>> >> >> >> >>> > Boiling it down the idea I'm proposing is that
>>>>>>>>> roles required for back compatibility get explicitly added on 
>>>>>>>>> startup, if
>>>>>>>>> not by the user then by the code. This is more flexible than assuming 
>>>>>>>>> that
>>>>>>>>> no role means every role, because then every new feature that has a 
>>>>>>>>> role
>>>>>>>>> will end up on legacy clusters which are also not back compatible.
>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>> >>> >> >> >> >>> +1, I totally agree. I even said so, when I said:
>>>>>>>>> "This is why I was advocating that 1) we assume the "data" as a 
>>>>>>>>> default, 2)
>>>>>>>>> not assume overseer to be implicitly defined (because of the way 
>>>>>>>>> overseer
>>>>>>>>> role is written today), 3) not assume any future roles to be true by
>>>>>>>>> default."
>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>> >>> >> >> >> >>> So, basically, I'm proposing that the "roles
>>>>>>>>> required for back compatibility" (that should be explicitly added on
>>>>>>>>> startup) be just the ["data"] role, and not the "overseer" role (due 
>>>>>>>>> to the
>>>>>>>>> way overseer role is currently defined, i.e. it is "preferred 
>>>>>>>>> overseer").
>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>> >>> >> >> >> >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>> >>> >> >> >> >>>> Very sorry don't mean to sound offended,
>>>>>>>>> Frustrated yes offended no :)... the most difficult thing about
>>>>>>>>> communication is the illusion it has occurred :)
>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>> >>> >> >> >> >>>> If you read back just a few emails you'll see
>>>>>>>>> where I talk about roles being applied on startup. Boiling it down 
>>>>>>>>> the idea
>>>>>>>>> I'm proposing is that roles required for back compatibility get 
>>>>>>>>> explicitly
>>>>>>>>> added on startup, if not by the user then by the code. This is more
>>>>>>>>> flexible than assuming that no role means every role, because then 
>>>>>>>>> every
>>>>>>>>> new feature that has a role will end up on legacy clusters which are 
>>>>>>>>> also
>>>>>>>>> not back compatible.
>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>> >>> >> >> >> >>>> There are points where I said all roles rather
>>>>>>>>> than back compatibility roles because I was thinking about back
>>>>>>>>> compatibility specifically, but you can't know that if I don't say 
>>>>>>>>> that can
>>>>>>>>> you :).
>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>> >>> >> >> >> >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan
>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>> >>> >> >> >> >>>>>
>>>>>>>>> >>> >> >> >> >>>>> > If you read more closely, my way can provide
>>>>>>>>> full back compatibility. To say or imply it doesn't isn't helping. 
>>>>>>>>> Perhaps
>>>>>>>>> you need to re-read?
>>>>>>>>> >>> >> >> >> >>>>>
>>>>>>>>> >>> >> >> >> >>>>> I understand e-mails are frustrating, and I'm
>>>>>>>>> trying my best. Please don't be offended, and kindly point me to the 
>>>>>>>>> exact
>>>>>>>>> part you want me to re-read.
>>>>>>>>> >>> >> >> >> >>>>>
>>>>>>>>> >>> >> >> >> >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan
>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> >    Positive - They denote the existence of
>>>>>>>>> a capability
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> Agree, the SIP already reflects this.
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> >   Absolute - Absence/Presence binary
>>>>>>>>> identification of a capability; no implications, no assumptions
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> Disagree, we need backcompat handling on
>>>>>>>>> nodes running without any roles. There has to be an implicit 
>>>>>>>>> assumption as
>>>>>>>>> to what roles are those nodes assumed to have. My proposal is that 
>>>>>>>>> only the
>>>>>>>>> "data" role be assumed, but not the "overseer" role. For any future 
>>>>>>>>> roles
>>>>>>>>> ("coordinator", "zookeeper" etc.), this decision as to what absence 
>>>>>>>>> of any
>>>>>>>>> role implies should be left to the implementation of that future role.
>>>>>>>>> Documentation should reflect clearly about these implicit assumptions.
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>> If you read more closely, my way can provide
>>>>>>>>> full back compatibility. To say or imply it doesn't isn't helping. 
>>>>>>>>> Perhaps
>>>>>>>>> you need to re-read?
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> >    Focused - Do one thing per role
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> Agree. However, I disagree with ideas where
>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to?
>>>>>>>>> Separate roles for nodes that do "faceting" or "spell correction" 
>>>>>>>>> etc.? But
>>>>>>>>> anyway, that is for discussion when we add future roles. This is 
>>>>>>>>> beyond
>>>>>>>>> this SIP.
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>> I am not asking you to implement every
>>>>>>>>> possible role of course :). As a note I know a company that is 
>>>>>>>>> running an
>>>>>>>>> entire separate cluster to offload and better serve highlighting on a
>>>>>>>>> subset of large docs, so YES I think there are people who may want 
>>>>>>>>> such
>>>>>>>>> fine grained control.
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> >    Accessible - It should be dead simple to
>>>>>>>>> determine the members of a role, avoid parsing blobs of json, avoid
>>>>>>>>> calculating implications, avoid consulting other resources after 
>>>>>>>>> listing
>>>>>>>>> nodes with the role
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> Agree. I'm open to any implementation details
>>>>>>>>> that make it easy. There should be a reasonable API to return these 
>>>>>>>>> node
>>>>>>>>> roles, with ability to filter by role or filter by node.
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> >    Independent - One role should not
>>>>>>>>> require other roles to be present
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> Do we need to have this hard and fast
>>>>>>>>> requirement upfront? There might be situations where this is 
>>>>>>>>> desirable. I
>>>>>>>>> feel we can discuss on a case by case basis whenever a future role is 
>>>>>>>>> added.
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> >    Persistent - roles should not be lost
>>>>>>>>> across reboot
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> Agree.
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> >    Immutable - roles should not change
>>>>>>>>> while the node is running
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> Agree
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> >    Lively - A node with a capability may
>>>>>>>>> not be presently providing that capability.
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> I don't understand, can you please elaborate?
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>> Specifically imagine the case where there are
>>>>>>>>> 100 nodes:
>>>>>>>>> >>> >> >> >> >>>>>> 1-100 ==> DATA
>>>>>>>>> >>> >> >> >> >>>>>> 101-103 ==> OVERSEER
>>>>>>>>> >>> >> >> >> >>>>>> 104-106 ==> ZOOKEEPER
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>> But you won't have 3 overseers... you'll want
>>>>>>>>> only one of those to be providing overseer functionality and the 
>>>>>>>>> other two
>>>>>>>>> to be capable, but not providing (so that if the current overseer 
>>>>>>>>> goes down
>>>>>>>>> a new one can be assigned).
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>> Then you decide you'd ike 5 Zookeepers. You
>>>>>>>>> start nodes 107-108 with that role, but you probably want to ensure 
>>>>>>>>> that
>>>>>>>>> zookeepers require some sort of command for them to actually join the
>>>>>>>>> zookeeper cluster (i.e. /admin?action=ZKADD&nodes=node107,node18) ... 
>>>>>>>>> to do
>>>>>>>>> that the nodes need to be up. But oh look I typoed 108... we want 
>>>>>>>>> that to
>>>>>>>>> fail... how? because 18 does not have the capability to become a 
>>>>>>>>> zookeeper.
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan
>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> > Ilan: A node not having node.roles defined
>>>>>>>>> should be assumed to have all roles. Not only data. I don't see a 
>>>>>>>>> reason to
>>>>>>>>> special case this one or any role.
>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: There should be no "assumptions"
>>>>>>>>> Nothing to figure out. A node has a role or not. For back 
>>>>>>>>> compatibility
>>>>>>>>> reasons, all roles would be assumed on startup if none specified.
>>>>>>>>> >>> >> >> >> >>>>>>>> > Jan: No role == all roles. Explicit list
>>>>>>>>> of roles = exactly those roles.
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> Problem with this approach is mainly to do
>>>>>>>>> with backcompat.
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> 1. Overseer backcompat:
>>>>>>>>> >>> >> >> >> >>>>>>>> If we don't make any modifications to how
>>>>>>>>> overseer works and adopt this approach (as quoted), then imagine this
>>>>>>>>> situation:
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> Solr1-100: No roles param (assumed to be
>>>>>>>>> "data,overseer").
>>>>>>>>> >>> >> >> >> >>>>>>>> Solr101: -Dnode.roles=overseer (intention:
>>>>>>>>> dedicated overseer)
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> User wants this node Solr101 to be a
>>>>>>>>> dedicated overseer, but for that to happen, he/she would need to 
>>>>>>>>> restart
>>>>>>>>> all the data nodes with -Dnode.roles=data. This will cause unnecessary
>>>>>>>>> disruption to running clusters where a dedicated overseer is needed. 
>>>>>>>>> Keep
>>>>>>>>> in mind, if a user needs a dedicated overseer, he's likely in an 
>>>>>>>>> emergency
>>>>>>>>> situation and restarting the whole cluster might not be viable for 
>>>>>>>>> him/her.
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> 2. Future roles might not be compatible with
>>>>>>>>> this "assumed to have all roles" idea:
>>>>>>>>> >>> >> >> >> >>>>>>>> Take the proposed "zookeeper" role for
>>>>>>>>> example. Today, regular nodes are not supposed to have embedded ZK 
>>>>>>>>> running
>>>>>>>>> on them. By introducing this artificial limitation ("assumed to have 
>>>>>>>>> all
>>>>>>>>> roles"), we constrain adoption of all future roles to necessarily 
>>>>>>>>> require a
>>>>>>>>> full cluster restart.
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> Keep in mind newer Solr versions can
>>>>>>>>> introduce new capabilities and roles. Imagine we have a role that is
>>>>>>>>> defined in a new Solr version (and there's functionality to go with 
>>>>>>>>> that
>>>>>>>>> role), and user upgrades to that version. However, his/her nodes all 
>>>>>>>>> were
>>>>>>>>> started with no node.roles param. Hence, if those nodes are "assumed 
>>>>>>>>> to
>>>>>>>>> have all roles", then just by virtue of upgrading to this new 
>>>>>>>>> version, new
>>>>>>>>> capabilities will be turned on for the entire cluster, whether or not 
>>>>>>>>> the
>>>>>>>>> user opted for such a capability. This is totally undesirable.
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: I actually don't want a coordinator
>>>>>>>>> to do more work, I would prefer small focused roles with names that
>>>>>>>>> accurately describe their function. In that light, COORDINATOR might 
>>>>>>>>> be too
>>>>>>>>> nebulous. How about AGREGATOR role? (what I was thinking of would 
>>>>>>>>> better be
>>>>>>>>> called a QUERY_ANALYSIS role)
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> If you want to do specific things like query
>>>>>>>>> analysis or query aggregation or bulk indexing etc, all of those can 
>>>>>>>>> be
>>>>>>>>> done on COORDINATOR nodes (as is the case in ElasticSearch). Having 
>>>>>>>>> tens of
>>>>>>>>> of " small focused roles" defined as first class concepts would be
>>>>>>>>> confusing to the user. As a remedy to your situation where you want 
>>>>>>>>> the
>>>>>>>>> coordinator role to also do query-analysis for shards, one possible
>>>>>>>>> solution is to send such a query to a coordinator node with a 
>>>>>>>>> parameter
>>>>>>>>> like "coordinator.query_analysis=true", and then the coordinator, 
>>>>>>>>> instead
>>>>>>>>> of blindly hitting remote shards, also does some extra work on behalf 
>>>>>>>>> of
>>>>>>>>> the shards.
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan
>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>> > If we make collections role-aware for
>>>>>>>>> example (replicas of that collection can only be
>>>>>>>>> >>> >> >> >> >>>>>>>>> > placed on nodes with a specific role, in
>>>>>>>>> addition to the other role based constraints),
>>>>>>>>> >>> >> >> >> >>>>>>>>> > the set of roles should be user
>>>>>>>>> extensible and not fixed.
>>>>>>>>> >>> >> >> >> >>>>>>>>> > If collections are not role aware, the
>>>>>>>>> constraints introduced by roles apply to all collections
>>>>>>>>> >>> >> >> >> >>>>>>>>> > equally which might be insufficient if a
>>>>>>>>> user needs for example a heavily used collection to
>>>>>>>>> >>> >> >> >> >>>>>>>>> > only be placed on more powerful nodes.
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>> I feel node roles and role-aware
>>>>>>>>> collections are orthogonal topics. What you describe above can be 
>>>>>>>>> achieved
>>>>>>>>> by the autoscaling+replica placement framework where the placement 
>>>>>>>>> plugins
>>>>>>>>> take the node roles as one of the inputs.
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>> > It does impact the design from early on:
>>>>>>>>> the set of roles need to be expandable by a user
>>>>>>>>> >>> >> >> >> >>>>>>>>> > by creating a collection with new roles
>>>>>>>>> for example (consumed by placement plugins) and be
>>>>>>>>> >>> >> >> >> >>>>>>>>> > able to start nodes with new (arbitrary)
>>>>>>>>> roles. Should such roles follow some naming syntax to
>>>>>>>>> >>> >> >> >> >>>>>>>>> > differentiate them from built in roles?
>>>>>>>>> To be able to fail on typos on roles - that otherwise can be
>>>>>>>>> >>> >> >> >> >>>>>>>>> > crippling and hard to debug. This implies
>>>>>>>>> in any case that the current design can't assume all
>>>>>>>>> >>> >> >> >> >>>>>>>>> > roles are known at compile time or define
>>>>>>>>> them in a Java enum.
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>> I think this should be achieved by
>>>>>>>>> something different from roles. Something like node labels (user 
>>>>>>>>> defined)
>>>>>>>>> which can then be used in a replica placement plugin to assign 
>>>>>>>>> replicas. I
>>>>>>>>> see roles as more closely associated with kinds of functionality a 
>>>>>>>>> node is
>>>>>>>>> designated for. Therefore, I feel that replica placements and user 
>>>>>>>>> defined
>>>>>>>>> node labels is out of scope for this SIP. It can be added later in a
>>>>>>>>> separate SIP, without being at odds with this proposal.
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan
>>>>>>>>> Ginzburg <[email protected]>:
>>>>>>>>> >>> >> >> >> >>>>>>>>>> > A node not having node.roles defined
>>>>>>>>> should be assumed to have all roles. Not only data. I don't see a 
>>>>>>>>> reason to
>>>>>>>>> special case this one or any role.
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>> +1, make it simple and transparent. No
>>>>>>>>> role == all roles. Explicit list of roles = exactly those roles.
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>> > (Gus) See my comment above, but maybe
>>>>>>>>> preference is something handled as a feature of the role rather than 
>>>>>>>>> via
>>>>>>>>> role designation?
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>> Yea, we always need an overseer, so that
>>>>>>>>> feature can decide to use its list of nodes as a preference if it so
>>>>>>>>> chooses.
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>> Aside: I think it makes it easier if we
>>>>>>>>> always prefix Solr env.vars and sys.props with "SOLR_" or "solr.", 
>>>>>>>>> i.e.
>>>>>>>>> -Dsolr.node.roles=foo. That way we can get away from having to have
>>>>>>>>> explicit code in bin/solr, bin/solr.cmd and SolrCLI to manage every 
>>>>>>>>> single
>>>>>>>>> property. Instead we can parse all ENVs and Props with the solr 
>>>>>>>>> prefix in
>>>>>>>>> our bootstrap code. And we can by convention allow e.g. docker run -e
>>>>>>>>> SOLR_NODE_ROLES=foo solr:9 and it would be the same ting...
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>>>>> Jan
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> >>> >> >> >> >>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>> [email protected]
>>>>>>>>> >>> >> >> >> >>>>>>>>>> For additional commands, e-mail:
>>>>>>>>> [email protected]
>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>> >>> >> >> >> >>>>>> --
>>>>>>>>> >>> >> >> >> >>>>>> http://www.needhamsoftware.com (work)
>>>>>>>>> >>> >> >> >> >>>>>> http://www.the111shift.com (play)
>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>> >>> >> >> >> >>>> --
>>>>>>>>> >>> >> >> >> >>>> http://www.needhamsoftware.com (work)
>>>>>>>>> >>> >> >> >> >>>> http://www.the111shift.com (play)
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> >>> >> >> >> To unsubscribe, e-mail:
>>>>>>>>> [email protected]
>>>>>>>>> >>> >> >> >> For additional commands, e-mail:
>>>>>>>>> [email protected]
>>>>>>>>> >>> >> >> >>
>>>>>>>>> >>> >> >> >
>>>>>>>>> >>> >> >> >
>>>>>>>>> >>> >> >> > --
>>>>>>>>> >>> >> >> > http://www.needhamsoftware.com (work)
>>>>>>>>> >>> >> >> > http://www.the111shift.com (play)
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >> >>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> >>> >> >> To unsubscribe, e-mail: [email protected]
>>>>>>>>> >>> >> >> For additional commands, e-mail:
>>>>>>>>> [email protected]
>>>>>>>>> >>> >> >>
>>>>>>>>> >>> >>
>>>>>>>>> >>> >>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> >>> >> To unsubscribe, e-mail: [email protected]
>>>>>>>>> >>> >> For additional commands, e-mail: [email protected]
>>>>>>>>> >>> >>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> >>> To unsubscribe, e-mail: [email protected]
>>>>>>>>> >>> For additional commands, e-mail: [email protected]
>>>>>>>>> >>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>
>>>>>>>>>
>>>>
>>>> --
>>>> -----------------------------------------------------
>>>> Noble Paul
>>>>
>>>>
>>>>

Re: First class support for node roles

Reply via email to