Re: First class support for node roles

Noble Paul Thu, 04 Nov 2021 13:52:13 -0700

The SIP can be boiled down to the following

* *Tag a node with a label (role) using a system property*
** Use the placement plugin to whitelist/block list certain nodes*
** Publish the roles through an API*


That's it

If you wish to add a new role, use the same concept.

Period

On Fri, Nov 5, 2021, 7:00 AM Noble Paul <[email protected]> wrote:

> Yes Ilan
> The coordinator is the first compelling usecase. The roles is the UX and
> it's a very simple piece. The real work is coming as a separate PR.
>
> Roles can be achieved in a clumsy way today. It's unintuitive and we don't
> want to make the user to jump through the hoops.
>
> I'll open a PR and you be the judge on the simplicity of  this SIP. It's
> not going to have any major impact on any component of Solr.
>
>
>
> On Fri, Nov 5, 2021, 2:01 AM Ilan Ginzburg <[email protected]> wrote:
>
>> I was noting that the real value of the proposal (real value = being able
>> to do things that are currently impossible with Solr) was due to an
>> independent concept of a coordinator "core", and that if we had this
>> (currently does not exist in Solr but apparently you do have it on a fork),
>> we can achieve most/all of what the SIP proposes with existing means, i.e.
>> without roles. Maybe in a less flexible/user friendly way, maybe not (given
>> the details of rolling out roles are still fuzzy).
>> And if we don't have the concept of coordinator core, then the roles by
>> themselves do not allow much more than what is already achievable by other
>> means.
>>
>> Ilan
>>
>> On Thu, Nov 4, 2021 at 12:02 PM Noble Paul <[email protected]> wrote:
>>
>>> The placement part of roles feature may use placement plugin API .
>>>
>>>
>>>  The implementation is not what we're discussing here. We need a
>>> consistent story for the user when it comes to roles. This discussion is
>>> about the UX rather than the impl.
>>>
>>> Most of our discussions are about how we should implement it
>>>
>>>
>>>
>>> On Thu, Nov 4, 2021, 9:27 PM Ilan Ginzburg <[email protected]> wrote:
>>>
>>>> A lot of the value of this SIP relies on the pseudo-core thing (because
>>>> placing on specific nodes is achievable today, Overseer role already
>>>> exists). Roles as described without the coordinator concept are just
>>>> another way to do things already possible today (with a very minor update
>>>> on the Affinity placement plugin - it might even support it right away
>>>> actually, didn't check).
>>>> Maybe "pseudo core" should go in first and condition the rest of the
>>>> work? It feels like a bigger chunk with more challenging integration issues
>>>> (routing, new concept in the collection/shard/replica hierarchy).
>>>>
>>>> Ilan
>>>>
>>>> On Thu, Nov 4, 2021 at 11:20 AM Noble Paul <[email protected]>
>>>> wrote:
>>>>
>>>>> None of the design is dictated by the version in which we implement
>>>>> this. The SIP is mostly about the "what", "why" and the UX
>>>>>
>>>>> I don't have any affinity to any particular version. This is
>>>>> definitely going to happen in 9.x. Even if it is built in 9.x we will have
>>>>> to build and support all versions of solr we use internally. When we
>>>>> eventually upgrade from our current version to a 9.x version , it has to 
>>>>> be
>>>>> backward compatible.The choice of whether this is available for public
>>>>> consumption as a branch/release is up for debate
>>>>>
>>>>> On Thu, Nov 4, 2021, 8:28 PM Jan Høydahl <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Let's do ourself a service and target 9.0 for roles. It's too late to
>>>>>> plan new features into 8.x.
>>>>>>
>>>>>> I don't understand the urgency either. I can get that certain Solr
>>>>>> users would wish for such a feature "yesterday" but that cannot drive our
>>>>>> decisions on what version to target for features. When targeting 9.0, all
>>>>>> upgrade or back-compat worries will need to be baked into the feature
>>>>>> itself, so that there is either code support or good documentation for 
>>>>>> how
>>>>>> to start using roles after upgrading a cluster to 9.0. Perhaps there must
>>>>>> be a temporary cluster-property in 9.0 "enableRoles=false" that can be 
>>>>>> set,
>>>>>> even if all 9.0 nodes are given roles on startup. Then, initially after 
>>>>>> the
>>>>>> upgrade, the cluster behaves as it did in 8.x. Then once you are ready to
>>>>>> enforce roles, you can flip the cluster property, and placement and 
>>>>>> routing
>>>>>> starts using roles. In 10.0 that property can then go away.
>>>>>>
>>>>>> When it comes to placement plugins, we can document in that they MUST
>>>>>> respect certain node roles (at least the data role), and treat it as a 
>>>>>> bug
>>>>>> if they don't.
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>> 4. nov. 2021 kl. 03:36 skrev Noble Paul <[email protected]>:
>>>>>>
>>>>>> Thanks everyone for participating in the discussion. I have gone
>>>>>> through all your valuable inputs and these are my suggestions
>>>>>>
>>>>>> Requirements?
>>>>>>
>>>>>>    1. Users should be able to designate a node with some role by
>>>>>>    starting (say -Dnode.roles=coordinator)
>>>>>>    2. This node should be able to perform a certain behavior
>>>>>>    3. Replica placement should be aware of this and may choose to
>>>>>>    place or not place a replica in this node
>>>>>>    4. Any client should be able to query any node in the cluster to
>>>>>>    get a list of nodes with a specified role or get the roles of a given 
>>>>>> node
>>>>>>
>>>>>>
>>>>>> Implementation?
>>>>>> Here is how we could implement each of the requirements:
>>>>>>
>>>>>>    1. We could theoretically use a well known system property and
>>>>>>    2. The actual behavior will have to be implemented in both 8.x or
>>>>>>    9.x
>>>>>>    3. Placement of replicas
>>>>>>    1. It’s not possible to do this in 8.x
>>>>>>       2. In 9.x, replica placement plugin can be internally used to
>>>>>>       ensure proper placement of replicas in the roles feature.
>>>>>>
>>>>>>       1. It can’t be done with the current design as users cannot
>>>>>>          chain multiple placement plugins or user has to build a custom 
>>>>>> placement
>>>>>>          plugin of his own
>>>>>>          2. There is no standard UX to achieve this. It will be a
>>>>>>          recipe (start nodes with this property and create these rules 
>>>>>> etc, etc).
>>>>>>          This is awkward & error prone, as compared to saying “start a 
>>>>>> node with
>>>>>>          coordinator role” and Solr will take care of it.
>>>>>>          4. There will be a new API endpoint to publish this
>>>>>>    information in 8.x and 9.x. This end point is important to make this
>>>>>>    feature usable
>>>>>>
>>>>>>
>>>>>> Conclusion
>>>>>>
>>>>>>    1. With a roles feature, we can achieve the objectives in a user
>>>>>>    friendly and intuitive way
>>>>>>    2. The user interface can be consistent across 8.x and 9.x even
>>>>>>    though 9.x can use the placement plugin internally
>>>>>>    3. The actual roles definition will be same across 8.x and 9.x
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 4, 2021 at 6:32 AM Noble Paul <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>> We explored all options to before arriving at this solution. Ishan
>>>>>>> has already explained why Tim's suggestions have their shortcomings 
>>>>>>> when it
>>>>>>> comes to user experience.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 4, 2021, 3:51 AM Michael Gibney <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> >I actually didn't realize that an empty Solr node would forward
>>>>>>>> the top-level
>>>>>>>> >request onward instead of just being the query controller itself?
>>>>>>>> That
>>>>>>>> >actually seems like a bug vs. a feature, IMO any node that receives
>>>>>>>> >the top-level query should just be the coordinator, what stops it?
>>>>>>>>
>>>>>>>> +1 to Tim's statement quoted above; unless I'm missing something,
>>>>>>>> this feels like an issue that should be addressed regardless of this 
>>>>>>>> SIP.
>>>>>>>> (perhaps it would be addressed incidentally by this SIP? -- in any 
>>>>>>>> event
>>>>>>>> the current situation seems to not make sense. As Tim points out, the
>>>>>>>> relevant configs should in principle be accessible from ZK whether or 
>>>>>>>> not
>>>>>>>> there's a core for a given collection on a given node).
>>>>>>>>
>>>>>>>> Considering the above, and especially given Ishan that you say "The
>>>>>>>> coordinator role is the biggest motivation for introducing the concept 
>>>>>>>> of
>>>>>>>> roles", while reading the SIP I found myself wishing for a fuller
>>>>>>>> enumeration of use cases, and a more sympathetic characterization of
>>>>>>>> alternatives (existing alternatives, and perhaps, as with the above 
>>>>>>>> "proxy
>>>>>>>> request" issue, simpler-but-not-yet-implemented alternatives).
>>>>>>>>
>>>>>>>> Combining questions about use cases with questions about
>>>>>>>> alternatives: assuming that 9.x autoscaling can indeed be reliably 
>>>>>>>> used to
>>>>>>>> stop replicas from being placed on nodes, how close would addressing 
>>>>>>>> the
>>>>>>>> orthogonal "proxy request" issue come to addressing potential use 
>>>>>>>> cases?
>>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 3, 2021 at 10:00 AM Ilan Ginzburg <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I think if we have the new "pseudo core" abstraction (I like it!
>>>>>>>>> Will it really be a core with an index on disk or some new 
>>>>>>>>> abstraction only
>>>>>>>>> tracked in ZK and in memory?) to play the role of coordinator, then 
>>>>>>>>> we have
>>>>>>>>> all we need with the affinity placement plugin framework for a data 
>>>>>>>>> free
>>>>>>>>> coordinator node implementation.
>>>>>>>>> It is easy to use system properties to exclude nodes from
>>>>>>>>> receiving replicas using the placement plugins, a minor change in the
>>>>>>>>> Affinity Placement Plugin. Such nodes will not receive any replicas 
>>>>>>>>> by the
>>>>>>>>> placement plugin not even at startup (the system property will be 
>>>>>>>>> assigned
>>>>>>>>> at startup so no manual intervention needed).
>>>>>>>>>
>>>>>>>>> It will not work if switching to another placement plugin, unless
>>>>>>>>> that other plugin reimplements that (simple) aspect. Is that an issue?
>>>>>>>>>
>>>>>>>>> Ilan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Nov 3, 2021 at 2:57 AM Ishan Chattopadhyaya <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Answers inline below.
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 3, 2021 at 5:56 AM Timothy Potter <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> One last thought on this for me ... I think it would be
>>>>>>>>>>> beneficial for
>>>>>>>>>>> the SIP to address how this new feature will work with the
>>>>>>>>>>> existing
>>>>>>>>>>> shards.preference solution and affinity based placement plugin.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I was more inclined to keep this SIP focused on broad concept of
>>>>>>>>>> roles, and any upcoming roles (coordinator role, along with that
>>>>>>>>>> pseudo-core functionality) to be described in their own issue (e.g.
>>>>>>>>>> SOLR-15715).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Moreover, your pseudo-replica solution sounds like a new replica
>>>>>>>>>>> type
>>>>>>>>>>> vs. a node level thing.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I misspoke when I called it "pseudo replica", it is actually a
>>>>>>>>>> "pseudo core". Replicas are shard level concepts, but such a pseudo 
>>>>>>>>>> core
>>>>>>>>>> that we plan to introduce will pertain to one or more collections. 
>>>>>>>>>> Imagine
>>>>>>>>>> collection1 has shard1 and shard2, there will be a single pseudo 
>>>>>>>>>> core for
>>>>>>>>>> collection1 (we haven't decided on the prefix of this pseudo core 
>>>>>>>>>> yet, but
>>>>>>>>>> a candidate can be ".collection1_coordinator"). Replica type won't 
>>>>>>>>>> fit this
>>>>>>>>>> mental model here. We can discuss this more in the SOLR-15715 issue.
>>>>>>>>>>
>>>>>>>>>> The placement strategy can place replicas
>>>>>>>>>>> based on replica type and node type (just a system property), so
>>>>>>>>>>> please address why you can't achieve a query coordinator
>>>>>>>>>>> behavior with
>>>>>>>>>>> a new replica type + improvements to the Affinity placement
>>>>>>>>>>> plugin?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> To put down my thoughts on why Affinity placement plugin won't
>>>>>>>>>> work for the purpose of ensuring that we have nodes that host no 
>>>>>>>>>> data on it:
>>>>>>>>>> 1. We want the ability to have nodes with no data on it as a
>>>>>>>>>> first class concept for users. Hence, if the Affinity placement 
>>>>>>>>>> plugin is
>>>>>>>>>> used for that purpose, users won't be able to switch out that plugin 
>>>>>>>>>> and
>>>>>>>>>> use anything of their own. Currently, IIUC, there's not way for 
>>>>>>>>>> users to
>>>>>>>>>> use multiple placement plugins.
>>>>>>>>>> 2. Nodes that shouldn't host any replica on it are generally
>>>>>>>>>> ephemeral in nature; many of them may join the cluster, they may go 
>>>>>>>>>> away.
>>>>>>>>>> If such a node joins the cluster, they immediately become eligible 
>>>>>>>>>> for
>>>>>>>>>> replica placement, before even the sysadmin is able to assign an 
>>>>>>>>>> affinity
>>>>>>>>>> placement configuration for that node. This is a problem.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Tim
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks for your thoughts and feedback, I think it will help us
>>>>>>>>>> put together the document with more insights into our design choices.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ishan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 2, 2021 at 6:14 PM Ishan Chattopadhyaya
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Also, in a cluster where new collections/shards/replicas are
>>>>>>>>>>> continuously added all the time, it would be pretty awkward to 
>>>>>>>>>>> start a node
>>>>>>>>>>> (in regular mode), briefly have it become eligible for replica 
>>>>>>>>>>> assignment,
>>>>>>>>>>> then invoking a replica placement rule/autoscaling policy for that 
>>>>>>>>>>> node to
>>>>>>>>>>> not place replicas on it. Instead, starting a node with a defined 
>>>>>>>>>>> role (as
>>>>>>>>>>> a startup param) precludes that brief period of eligibility for 
>>>>>>>>>>> replica
>>>>>>>>>>> placement on such a node.
>>>>>>>>>>> >
>>>>>>>>>>> > On Wed, Nov 3, 2021 at 5:39 AM Ishan Chattopadhyaya <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> If we were to tell users how to do "scatter gather on an
>>>>>>>>>>> empty node", *how exactly* would you recommend users have an empty 
>>>>>>>>>>> node to
>>>>>>>>>>> begin with? Wouldn't you say something like "for 8x you can do this 
>>>>>>>>>>> (rule
>>>>>>>>>>> based replica placement) or do that (autoscaling), but for 9x you 
>>>>>>>>>>> do this
>>>>>>>>>>> new thing". Having a node that doesn't have a data role seems like a
>>>>>>>>>>> consistent and an elegant way for users to invoke such a 
>>>>>>>>>>> functionality and
>>>>>>>>>>> also easily relate to a broad concept, without having to deal with
>>>>>>>>>>> autoscaling frameworks of the ancient past, medieval past or the 
>>>>>>>>>>> future.
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Wed, Nov 3, 2021 at 5:29 AM Timothy Potter <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> As opposed to what? Looking up the configset for the
>>>>>>>>>>> addressed
>>>>>>>>>>> >>> collection and pulling whatever information it needs from
>>>>>>>>>>> cached data.
>>>>>>>>>>> >>> I'm sure there are some nuances but I hardly think you need
>>>>>>>>>>> a node
>>>>>>>>>>> >>> role framework to deal with determine the unique key field
>>>>>>>>>>> to do
>>>>>>>>>>> >>> scatter gather on an empty node when you have easy access to
>>>>>>>>>>> >>> collection metadata.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Doesn't seem like a hard thing to overcome to me.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On Tue, Nov 2, 2021 at 5:49 PM Noble Paul <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>> >
>>>>>>>>>>> >>> >
>>>>>>>>>>> >>> >
>>>>>>>>>>> >>> > On Wed, Nov 3, 2021, 10:46 AM Timothy Potter <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>> >>
>>>>>>>>>>> >>> >> I'm not missing the point of the query coordinator, but I
>>>>>>>>>>> actually
>>>>>>>>>>> >>> >> didn't realize that an empty Solr node would forward the
>>>>>>>>>>> top-level
>>>>>>>>>>> >>> >> request onward instead of just being the query controller
>>>>>>>>>>> itself? That
>>>>>>>>>>> >>> >> actually seems like a bug vs. a feature, IMO any node
>>>>>>>>>>> that receives
>>>>>>>>>>> >>> >> the top-level query should just be the coordinator, what
>>>>>>>>>>> stops it?
>>>>>>>>>>> >>> >
>>>>>>>>>>> >>> >
>>>>>>>>>>> >>> > To process a request there should be a core that uses the
>>>>>>>>>>> same configset as the requested collection.
>>>>>>>>>>> >>> >>
>>>>>>>>>>> >>> >>
>>>>>>>>>>> >>> >> Anyway, it sounds to me like you guys have your minds
>>>>>>>>>>> made up
>>>>>>>>>>> >>> >> regardless of feedback.
>>>>>>>>>>> >>> >>
>>>>>>>>>>> >>> >> Btw ~ I only mentioned the Zookeeper part b/c it's in
>>>>>>>>>>> your SIP as a
>>>>>>>>>>> >>> >> specific role, not sure why you took that as me wanting
>>>>>>>>>>> to discuss the
>>>>>>>>>>> >>> >> embedded ZK in your SIP?
>>>>>>>>>>> >>> >>
>>>>>>>>>>> >>> >> On Tue, Nov 2, 2021 at 5:13 PM Ishan Chattopadhyaya
>>>>>>>>>>> >>> >> <[email protected]> wrote:
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > Hi Tim,
>>>>>>>>>>> >>> >> > Here are my responses inline.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > On Wed, Nov 3, 2021 at 3:22 AM Timothy Potter <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> I'm just not convinced this feature is even needed and
>>>>>>>>>>> the SIP is not
>>>>>>>>>>> >>> >> >> convincing that "There is no proper alternative today."
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > There are no proper alternatives today, just hacks. On
>>>>>>>>>>> 8x, we have two different deprecated frameworks to stop nodes from 
>>>>>>>>>>> being
>>>>>>>>>>> placed on a node (1. rule based replica placement, 2. autoscaling
>>>>>>>>>>> framework). On 9x, we have a new autoscaling framework, which I 
>>>>>>>>>>> don't even
>>>>>>>>>>> think is fully implemented. And, there's definitely no way to have 
>>>>>>>>>>> a node
>>>>>>>>>>> act as a query coordinator without having data on it.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> 1) Just b/c Elastic and Vespa have a concept of node
>>>>>>>>>>> roles, doesn't
>>>>>>>>>>> >>> >> >> mean Solr needs this.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > Solr needs this. Elastic has such concepts is a
>>>>>>>>>>> coincidence, and also means we have an opportunity to catch up with 
>>>>>>>>>>> them;
>>>>>>>>>>> they have these concepts for a reason.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> Also, some of Elastic's roles overlap with
>>>>>>>>>>> >>> >> >> concepts Solr already has in a different form, i.e
>>>>>>>>>>> data_hot sounds
>>>>>>>>>>> >>> >> >> like NRT and data_warm sounds a lot like our Pull
>>>>>>>>>>> Replica Type
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > I think that is beyond the scope of this SIP.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> 2) You can achieve the "coordinator" role with
>>>>>>>>>>> auto-scaling rules
>>>>>>>>>>> >>> >> >> pre-9.x and with the AffinityPlacementPlugin (heck, it
>>>>>>>>>>> even has a node
>>>>>>>>>>> >>> >> >> type built in:
>>>>>>>>>>> .requestNodeSystemProperty(AffinityPlacementConfig.NODE_TYPE_SYSPROP).
>>>>>>>>>>> >>> >> >> Simply build your replica placement rules such that no
>>>>>>>>>>> replicas land
>>>>>>>>>>> >>> >> >> on "coordinator" nodes. And you can route queries
>>>>>>>>>>> using node.sysprop
>>>>>>>>>>> >>> >> >> already using shards.preference.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > I think you missed the whole point of the query
>>>>>>>>>>> coordinator. Please refer to this
>>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715.
>>>>>>>>>>> >>> >> > Let me summarize the main difference between what (I
>>>>>>>>>>> think) you refer to and what is proposed in SOLR-15715.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > With your suggestion, we'll have a node that doesn't
>>>>>>>>>>> host any replicas. And you suggest queries landing on such nodes be 
>>>>>>>>>>> routed
>>>>>>>>>>> using shards.preference? Well, in such a case, these queries will be
>>>>>>>>>>> forwarded/proxied to a random node hosting a replica of the 
>>>>>>>>>>> collection and
>>>>>>>>>>> that node then acts as the coordinator. This situation is no better 
>>>>>>>>>>> than
>>>>>>>>>>> sending the query directly to that particular node.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > What is proposed in SOLR-15715 is a query aggregation
>>>>>>>>>>> functionality. There will be pseudo replicas (aware of the 
>>>>>>>>>>> configset) on
>>>>>>>>>>> this coordinator node that handle the request themselves, sends 
>>>>>>>>>>> shard
>>>>>>>>>>> requests to data hosting replicas, collects responses and merges 
>>>>>>>>>>> them, and
>>>>>>>>>>> sends back to the user. This merge step is usually extremely memory
>>>>>>>>>>> intensive, and it would be good to serve these off stateless nodes 
>>>>>>>>>>> (that
>>>>>>>>>>> host no data).
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> 3) Dedicated overseer role? I thought we were removing
>>>>>>>>>>> the overseer?!?
>>>>>>>>>>> >>> >> >> Also, we already have the ability to run the overseer
>>>>>>>>>>> on specific
>>>>>>>>>>> >>> >> >> nodes w/o a new framework, so this doesn't really
>>>>>>>>>>> convince me we need
>>>>>>>>>>> >>> >> >> a new framework.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > There's absolutely no change proposed to the "overseer"
>>>>>>>>>>> role. What users need on production clusters are nodes dedicated for
>>>>>>>>>>> overseer operations, and for that the current "overseer" role 
>>>>>>>>>>> suffices,
>>>>>>>>>>> together with some functionality to not place replicas on such 
>>>>>>>>>>> nodes.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> 4) We will indeed need to decide which nodes host
>>>>>>>>>>> embedded Zookeeper's
>>>>>>>>>>> >>> >> >> but I'd argue that solution hasn't been designed
>>>>>>>>>>> entirely and we
>>>>>>>>>>> >>> >> >> probably don't need a formal node role framework to
>>>>>>>>>>> determine which
>>>>>>>>>>> >>> >> >> nodes host embedded ZKs. Moreover, embedded ZK seems
>>>>>>>>>>> more like a small
>>>>>>>>>>> >>> >> >> cluster thing and anyone running a large cluster will
>>>>>>>>>>> probably have a
>>>>>>>>>>> >>> >> >> dedicated ZK ensemble as they do today. The node role
>>>>>>>>>>> thing seems like
>>>>>>>>>>> >>> >> >> it's intended for large clusters and my gut says few
>>>>>>>>>>> will use embedded
>>>>>>>>>>> >>> >> >> ZK for large clusters.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > This SIP is not the right place for this discussion.
>>>>>>>>>>> There's a separate SIP for this.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> 5) You can also achieve a lot of "node role"
>>>>>>>>>>> functionality in query
>>>>>>>>>>> >>> >> >> routing using the shards.preference parameter.
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > That doesn't solve the purpose behind
>>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> At the very least, the SIP needs to list specific use
>>>>>>>>>>> cases that
>>>>>>>>>>> >>> >> >> require this feature that are not achievable with the
>>>>>>>>>>> current features
>>>>>>>>>>> >>> >> >> before getting bogged down in the impl. details.
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> > The coordinator role is the biggest motivation for
>>>>>>>>>>> introducing the concept of roles. However, in addition to what is 
>>>>>>>>>>> proposed
>>>>>>>>>>> in SOLR-15715, a coordinator node can later on also be used as a 
>>>>>>>>>>> node for
>>>>>>>>>>> users to run streaming expressions on, do bulk indexing on (impl 
>>>>>>>>>>> details
>>>>>>>>>>> for this to come later, don't want distraction here).
>>>>>>>>>>> >>> >> >
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> Tim
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >> On Tue, Nov 2, 2021 at 3:20 PM Gus Heck <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>> >> >> >
>>>>>>>>>>> >>> >> >> > I think there are things not yet accounted for. Time
>>>>>>>>>>> I spent yesterday is biting me today. Pls give a couple days.
>>>>>>>>>>> >>> >> >> >
>>>>>>>>>>> >>> >> >> > On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> Hey Ishan,
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> I appreciate you writing up the SIP!  Here's some
>>>>>>>>>>> notes/questions I
>>>>>>>>>>> >>> >> >> >> had as I was reading through your writeup and this
>>>>>>>>>>> mail thread.
>>>>>>>>>>> >>> >> >> >> ("----" separators between thoughts, hopefully that
>>>>>>>>>>> helps.)
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> ----
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> I'll add my vote to what Jan, Gus, Ilan, and
>>>>>>>>>>> Houston already
>>>>>>>>>>> >>> >> >> >> suggested: roles should default to "all-on".  I see
>>>>>>>>>>> the downsides
>>>>>>>>>>> >>> >> >> >> you're worried about with that approach (esp.
>>>>>>>>>>> around 'overseer'), but
>>>>>>>>>>> >>> >> >> >> they may be mitigatable, at least in part.
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> > [mail thread] User wants this node Solr101 to be
>>>>>>>>>>> a dedicated overseer, but for that to happen, he/she would need to 
>>>>>>>>>>> restart
>>>>>>>>>>> all the data nodes with -Dnode.roles=data
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> Sure, if roles can only be specified at startup.
>>>>>>>>>>> But that may be a
>>>>>>>>>>> >>> >> >> >> self-imposed constraint.
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> An API to change a node's roles would remove the
>>>>>>>>>>> need for a restart
>>>>>>>>>>> >>> >> >> >> and make it easy for users to affect the semantics
>>>>>>>>>>> they want.  You
>>>>>>>>>>> >>> >> >> >> decided you want a dedicated overseer N nodes into
>>>>>>>>>>> your cluster
>>>>>>>>>>> >>> >> >> >> deployment?  Deploy node 'N' with the 'overseer',
>>>>>>>>>>> and toggle the
>>>>>>>>>>> >>> >> >> >> overseer role off on the remainder.
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> Now, I understand that you don't want roles to
>>>>>>>>>>> change at runtime, but
>>>>>>>>>>> >>> >> >> >> I haven't seen you get much into "why", beyond
>>>>>>>>>>> saying "it is very
>>>>>>>>>>> >>> >> >> >> risky to have nodes change roles while they are up
>>>>>>>>>>> and running."  Can
>>>>>>>>>>> >>> >> >> >> you expand a bit on the risks you're worried
>>>>>>>>>>> about?  If you're
>>>>>>>>>>> >>> >> >> >> explicit about them here maybe someone can think of
>>>>>>>>>>> a clever way to
>>>>>>>>>>> >>> >> >> >> address them?
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> > Hence, if those nodes are "assumed to have all
>>>>>>>>>>> roles", then just by virtue of upgrading to this new version, new
>>>>>>>>>>> capabilities will be turned on for the entire cluster, whether or 
>>>>>>>>>>> not the
>>>>>>>>>>> user opted for such a capability. This is totally undesirable.
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> Obviously "roles" refer to much bigger chunks of
>>>>>>>>>>> functionality than
>>>>>>>>>>> >>> >> >> >> usual, so in a sense defaulting roles on is
>>>>>>>>>>> scarier.  But in a sense
>>>>>>>>>>> >>> >> >> >> you're describing something that's an inherent part
>>>>>>>>>>> of software
>>>>>>>>>>> >>> >> >> >> releases.  Releases expose new features that are
>>>>>>>>>>> typically on by
>>>>>>>>>>> >>> >> >> >> default.  A new default-on role in 9.1 might hurt a
>>>>>>>>>>> user, but there's
>>>>>>>>>>> >>> >> >> >> no fundamental difference between that and a change
>>>>>>>>>>> to backups or
>>>>>>>>>>> >>> >> >> >> replication or whatever in the same release.
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> I don't mean to belittle the difference in scope -
>>>>>>>>>>> I get your concern.
>>>>>>>>>>> >>> >> >> >> But IMO this is something to address with good
>>>>>>>>>>> release notes and
>>>>>>>>>>> >>> >> >> >> documentation.  Designing for admins who don't do
>>>>>>>>>>> even cursory
>>>>>>>>>>> >>> >> >> >> research before an upgrade ties both our hands
>>>>>>>>>>> behind our back as a
>>>>>>>>>>> >>> >> >> >> project.
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> ----
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> > [SIP] Internal representation in ZK ...
>>>>>>>>>>> Implementation details like these can be fleshed out in the PR
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> IMO this is important enough to flush out as part
>>>>>>>>>>> of the SIP, at least
>>>>>>>>>>> >>> >> >> >> in broad strokes.  It affects backcompat, SolrJ
>>>>>>>>>>> client design, etc.
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> ----
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> > [SIP] GET /api/cluster/roles?node=node1
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> Woohoo - way to include a v2 API definition!
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> AFAIR, the v2 API has a /nodes path defined - I
>>>>>>>>>>> wonder whether "GET
>>>>>>>>>>> >>> >> >> >> /nodes/someNode/roles" wouldn't be a more intuitive
>>>>>>>>>>> endpoint for the
>>>>>>>>>>> >>> >> >> >> "get the roles this node has" functionality.
>>>>>>>>>>> Though I leave that for
>>>>>>>>>>> >>> >> >> >> your consideration.
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> ----
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> Looking forward to your responses and seeing the
>>>>>>>>>>> SIP progress!  It's a
>>>>>>>>>>> >>> >> >> >> really cool, promising idea IMO.
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> Best,
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> Jason
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >> On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya
>>>>>>>>>>> >>> >> >> >> <[email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >
>>>>>>>>>>> >>> >> >> >> > Are there any unaddressed outstanding concerns
>>>>>>>>>>> that we should hold up the SIP for?
>>>>>>>>>>> >>> >> >> >> >
>>>>>>>>>>> >>> >> >> >> > On Mon, 1 Nov, 2021, 10:31 pm Ishan
>>>>>>>>>>> Chattopadhyaya, <[email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>>> >>> >> >> >> >>> >> Agree. However, I disagree with ideas where
>>>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to?
>>>>>>>>>>> Separate roles for
>>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>>> >>> >> >> >> >>> >> nodes that do "faceting" or "spell
>>>>>>>>>>> correction" etc.? But anyway, that is for discussion when we add 
>>>>>>>>>>> future
>>>>>>>>>>> roles. This is beyond this SIP.
>>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>>> >>> >> >> >> >> > I am not asking you to implement every
>>>>>>>>>>> possible role of course :). As a note I know a company that is 
>>>>>>>>>>> running an
>>>>>>>>>>> entire separate
>>>>>>>>>>> >>> >> >> >> >> > cluster to offload and better serve
>>>>>>>>>>> highlighting on a subset of large docs, so YES I think there are 
>>>>>>>>>>> people who
>>>>>>>>>>> may want such fine grained control.
>>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>>> >>> >> >> >> >> Cool, I think we can discuss adding any
>>>>>>>>>>> additional roles (for highlighting?) on a case by case basis at a 
>>>>>>>>>>> later
>>>>>>>>>>> point.
>>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>>> >>> >> >> >> >>
>>>>>>>>>>> >>> >> >> >> >> On Mon, Nov 1, 2021 at 10:25 PM Ishan
>>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>>> >>> >> >> >> >>> > Boiling it down the idea I'm proposing is
>>>>>>>>>>> that roles required for back compatibility get explicitly added on 
>>>>>>>>>>> startup,
>>>>>>>>>>> if not by the user then by the code. This is more flexible than 
>>>>>>>>>>> assuming
>>>>>>>>>>> that no role means every role, because then every new feature that 
>>>>>>>>>>> has a
>>>>>>>>>>> role will end up on legacy clusters which are also not back 
>>>>>>>>>>> compatible.
>>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>>> >>> >> >> >> >>> +1, I totally agree. I even said so, when I
>>>>>>>>>>> said: "This is why I was advocating that 1) we assume the "data" as 
>>>>>>>>>>> a
>>>>>>>>>>> default, 2) not assume overseer to be implicitly defined (because 
>>>>>>>>>>> of the
>>>>>>>>>>> way overseer role is written today), 3) not assume any future roles 
>>>>>>>>>>> to be
>>>>>>>>>>> true by default."
>>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>>> >>> >> >> >> >>> So, basically, I'm proposing that the "roles
>>>>>>>>>>> required for back compatibility" (that should be explicitly added on
>>>>>>>>>>> startup) be just the ["data"] role, and not the "overseer" role 
>>>>>>>>>>> (due to the
>>>>>>>>>>> way overseer role is currently defined, i.e. it is "preferred 
>>>>>>>>>>> overseer").
>>>>>>>>>>> >>> >> >> >> >>>
>>>>>>>>>>> >>> >> >> >> >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>>> >>> >> >> >> >>>> Very sorry don't mean to sound offended,
>>>>>>>>>>> Frustrated yes offended no :)... the most difficult thing about
>>>>>>>>>>> communication is the illusion it has occurred :)
>>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>>> >>> >> >> >> >>>> If you read back just a few emails you'll see
>>>>>>>>>>> where I talk about roles being applied on startup. Boiling it down 
>>>>>>>>>>> the idea
>>>>>>>>>>> I'm proposing is that roles required for back compatibility get 
>>>>>>>>>>> explicitly
>>>>>>>>>>> added on startup, if not by the user then by the code. This is more
>>>>>>>>>>> flexible than assuming that no role means every role, because then 
>>>>>>>>>>> every
>>>>>>>>>>> new feature that has a role will end up on legacy clusters which 
>>>>>>>>>>> are also
>>>>>>>>>>> not back compatible.
>>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>>> >>> >> >> >> >>>> There are points where I said all roles rather
>>>>>>>>>>> than back compatibility roles because I was thinking about back
>>>>>>>>>>> compatibility specifically, but you can't know that if I don't say 
>>>>>>>>>>> that can
>>>>>>>>>>> you :).
>>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>>> >>> >> >> >> >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan
>>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >>>>>
>>>>>>>>>>> >>> >> >> >> >>>>> > If you read more closely, my way can
>>>>>>>>>>> provide full back compatibility. To say or imply it doesn't isn't 
>>>>>>>>>>> helping.
>>>>>>>>>>> Perhaps you need to re-read?
>>>>>>>>>>> >>> >> >> >> >>>>>
>>>>>>>>>>> >>> >> >> >> >>>>> I understand e-mails are frustrating, and I'm
>>>>>>>>>>> trying my best. Please don't be offended, and kindly point me to 
>>>>>>>>>>> the exact
>>>>>>>>>>> part you want me to re-read.
>>>>>>>>>>> >>> >> >> >> >>>>>
>>>>>>>>>>> >>> >> >> >> >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan
>>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> >    Positive - They denote the existence
>>>>>>>>>>> of a capability
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> Agree, the SIP already reflects this.
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> >   Absolute - Absence/Presence binary
>>>>>>>>>>> identification of a capability; no implications, no assumptions
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> Disagree, we need backcompat handling on
>>>>>>>>>>> nodes running without any roles. There has to be an implicit 
>>>>>>>>>>> assumption as
>>>>>>>>>>> to what roles are those nodes assumed to have. My proposal is that 
>>>>>>>>>>> only the
>>>>>>>>>>> "data" role be assumed, but not the "overseer" role. For any future 
>>>>>>>>>>> roles
>>>>>>>>>>> ("coordinator", "zookeeper" etc.), this decision as to what absence 
>>>>>>>>>>> of any
>>>>>>>>>>> role implies should be left to the implementation of that future 
>>>>>>>>>>> role.
>>>>>>>>>>> Documentation should reflect clearly about these implicit 
>>>>>>>>>>> assumptions.
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>> If you read more closely, my way can provide
>>>>>>>>>>> full back compatibility. To say or imply it doesn't isn't helping. 
>>>>>>>>>>> Perhaps
>>>>>>>>>>> you need to re-read?
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> >    Focused - Do one thing per role
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> Agree. However, I disagree with ideas where
>>>>>>>>>>> "query analysis" has a role of its own. Where would that lead us to?
>>>>>>>>>>> Separate roles for nodes that do "faceting" or "spell correction" 
>>>>>>>>>>> etc.? But
>>>>>>>>>>> anyway, that is for discussion when we add future roles. This is 
>>>>>>>>>>> beyond
>>>>>>>>>>> this SIP.
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>> I am not asking you to implement every
>>>>>>>>>>> possible role of course :). As a note I know a company that is 
>>>>>>>>>>> running an
>>>>>>>>>>> entire separate cluster to offload and better serve highlighting on 
>>>>>>>>>>> a
>>>>>>>>>>> subset of large docs, so YES I think there are people who may want 
>>>>>>>>>>> such
>>>>>>>>>>> fine grained control.
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> >    Accessible - It should be dead simple
>>>>>>>>>>> to determine the members of a role, avoid parsing blobs of json, 
>>>>>>>>>>> avoid
>>>>>>>>>>> calculating implications, avoid consulting other resources after 
>>>>>>>>>>> listing
>>>>>>>>>>> nodes with the role
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> Agree. I'm open to any implementation
>>>>>>>>>>> details that make it easy. There should be a reasonable API to 
>>>>>>>>>>> return these
>>>>>>>>>>> node roles, with ability to filter by role or filter by node.
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> >    Independent - One role should not
>>>>>>>>>>> require other roles to be present
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> Do we need to have this hard and fast
>>>>>>>>>>> requirement upfront? There might be situations where this is 
>>>>>>>>>>> desirable. I
>>>>>>>>>>> feel we can discuss on a case by case basis whenever a future role 
>>>>>>>>>>> is added.
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> >    Persistent - roles should not be lost
>>>>>>>>>>> across reboot
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> Agree.
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> >    Immutable - roles should not change
>>>>>>>>>>> while the node is running
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> Agree
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> >    Lively - A node with a capability may
>>>>>>>>>>> not be presently providing that capability.
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> I don't understand, can you please
>>>>>>>>>>> elaborate?
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>> Specifically imagine the case where there
>>>>>>>>>>> are 100 nodes:
>>>>>>>>>>> >>> >> >> >> >>>>>> 1-100 ==> DATA
>>>>>>>>>>> >>> >> >> >> >>>>>> 101-103 ==> OVERSEER
>>>>>>>>>>> >>> >> >> >> >>>>>> 104-106 ==> ZOOKEEPER
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>> But you won't have 3 overseers... you'll
>>>>>>>>>>> want only one of those to be providing overseer functionality and 
>>>>>>>>>>> the other
>>>>>>>>>>> two to be capable, but not providing (so that if the current 
>>>>>>>>>>> overseer goes
>>>>>>>>>>> down a new one can be assigned).
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>> Then you decide you'd ike 5 Zookeepers. You
>>>>>>>>>>> start nodes 107-108 with that role, but you probably want to ensure 
>>>>>>>>>>> that
>>>>>>>>>>> zookeepers require some sort of command for them to actually join 
>>>>>>>>>>> the
>>>>>>>>>>> zookeeper cluster (i.e. /admin?action=ZKADD&nodes=node107,node18) 
>>>>>>>>>>> ... to do
>>>>>>>>>>> that the nodes need to be up. But oh look I typoed 108... we want 
>>>>>>>>>>> that to
>>>>>>>>>>> fail... how? because 18 does not have the capability to become a 
>>>>>>>>>>> zookeeper.
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan
>>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> > Ilan: A node not having node.roles
>>>>>>>>>>> defined should be assumed to have all roles. Not only data. I don't 
>>>>>>>>>>> see a
>>>>>>>>>>> reason to special case this one or any role.
>>>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: There should be no "assumptions"
>>>>>>>>>>> Nothing to figure out. A node has a role or not. For back 
>>>>>>>>>>> compatibility
>>>>>>>>>>> reasons, all roles would be assumed on startup if none specified.
>>>>>>>>>>> >>> >> >> >> >>>>>>>> > Jan: No role == all roles. Explicit list
>>>>>>>>>>> of roles = exactly those roles.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> Problem with this approach is mainly to do
>>>>>>>>>>> with backcompat.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> 1. Overseer backcompat:
>>>>>>>>>>> >>> >> >> >> >>>>>>>> If we don't make any modifications to how
>>>>>>>>>>> overseer works and adopt this approach (as quoted), then imagine 
>>>>>>>>>>> this
>>>>>>>>>>> situation:
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> Solr1-100: No roles param (assumed to be
>>>>>>>>>>> "data,overseer").
>>>>>>>>>>> >>> >> >> >> >>>>>>>> Solr101: -Dnode.roles=overseer (intention:
>>>>>>>>>>> dedicated overseer)
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> User wants this node Solr101 to be a
>>>>>>>>>>> dedicated overseer, but for that to happen, he/she would need to 
>>>>>>>>>>> restart
>>>>>>>>>>> all the data nodes with -Dnode.roles=data. This will cause 
>>>>>>>>>>> unnecessary
>>>>>>>>>>> disruption to running clusters where a dedicated overseer is 
>>>>>>>>>>> needed. Keep
>>>>>>>>>>> in mind, if a user needs a dedicated overseer, he's likely in an 
>>>>>>>>>>> emergency
>>>>>>>>>>> situation and restarting the whole cluster might not be viable for 
>>>>>>>>>>> him/her.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> 2. Future roles might not be compatible
>>>>>>>>>>> with this "assumed to have all roles" idea:
>>>>>>>>>>> >>> >> >> >> >>>>>>>> Take the proposed "zookeeper" role for
>>>>>>>>>>> example. Today, regular nodes are not supposed to have embedded ZK 
>>>>>>>>>>> running
>>>>>>>>>>> on them. By introducing this artificial limitation ("assumed to 
>>>>>>>>>>> have all
>>>>>>>>>>> roles"), we constrain adoption of all future roles to necessarily 
>>>>>>>>>>> require a
>>>>>>>>>>> full cluster restart.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> Keep in mind newer Solr versions can
>>>>>>>>>>> introduce new capabilities and roles. Imagine we have a role that is
>>>>>>>>>>> defined in a new Solr version (and there's functionality to go with 
>>>>>>>>>>> that
>>>>>>>>>>> role), and user upgrades to that version. However, his/her nodes 
>>>>>>>>>>> all were
>>>>>>>>>>> started with no node.roles param. Hence, if those nodes are 
>>>>>>>>>>> "assumed to
>>>>>>>>>>> have all roles", then just by virtue of upgrading to this new 
>>>>>>>>>>> version, new
>>>>>>>>>>> capabilities will be turned on for the entire cluster, whether or 
>>>>>>>>>>> not the
>>>>>>>>>>> user opted for such a capability. This is totally undesirable.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> > Gus: I actually don't want a coordinator
>>>>>>>>>>> to do more work, I would prefer small focused roles with names that
>>>>>>>>>>> accurately describe their function. In that light, COORDINATOR 
>>>>>>>>>>> might be too
>>>>>>>>>>> nebulous. How about AGREGATOR role? (what I was thinking of would 
>>>>>>>>>>> better be
>>>>>>>>>>> called a QUERY_ANALYSIS role)
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> If you want to do specific things like
>>>>>>>>>>> query analysis or query aggregation or bulk indexing etc, all of 
>>>>>>>>>>> those can
>>>>>>>>>>> be done on COORDINATOR nodes (as is the case in ElasticSearch). 
>>>>>>>>>>> Having tens
>>>>>>>>>>> of of " small focused roles" defined as first class concepts would 
>>>>>>>>>>> be
>>>>>>>>>>> confusing to the user. As a remedy to your situation where you want 
>>>>>>>>>>> the
>>>>>>>>>>> coordinator role to also do query-analysis for shards, one possible
>>>>>>>>>>> solution is to send such a query to a coordinator node with a 
>>>>>>>>>>> parameter
>>>>>>>>>>> like "coordinator.query_analysis=true", and then the coordinator, 
>>>>>>>>>>> instead
>>>>>>>>>>> of blindly hitting remote shards, also does some extra work on 
>>>>>>>>>>> behalf of
>>>>>>>>>>> the shards.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan
>>>>>>>>>>> Chattopadhyaya <[email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > If we make collections role-aware for
>>>>>>>>>>> example (replicas of that collection can only be
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > placed on nodes with a specific role,
>>>>>>>>>>> in addition to the other role based constraints),
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > the set of roles should be user
>>>>>>>>>>> extensible and not fixed.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > If collections are not role aware, the
>>>>>>>>>>> constraints introduced by roles apply to all collections
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > equally which might be insufficient if
>>>>>>>>>>> a user needs for example a heavily used collection to
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > only be placed on more powerful nodes.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> I feel node roles and role-aware
>>>>>>>>>>> collections are orthogonal topics. What you describe above can be 
>>>>>>>>>>> achieved
>>>>>>>>>>> by the autoscaling+replica placement framework where the placement 
>>>>>>>>>>> plugins
>>>>>>>>>>> take the node roles as one of the inputs.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > It does impact the design from early
>>>>>>>>>>> on: the set of roles need to be expandable by a user
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > by creating a collection with new roles
>>>>>>>>>>> for example (consumed by placement plugins) and be
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > able to start nodes with new
>>>>>>>>>>> (arbitrary) roles. Should such roles follow some naming syntax to
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > differentiate them from built in roles?
>>>>>>>>>>> To be able to fail on typos on roles - that otherwise can be
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > crippling and hard to debug. This
>>>>>>>>>>> implies in any case that the current design can't assume all
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> > roles are known at compile time or
>>>>>>>>>>> define them in a Java enum.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> I think this should be achieved by
>>>>>>>>>>> something different from roles. Something like node labels (user 
>>>>>>>>>>> defined)
>>>>>>>>>>> which can then be used in a replica placement plugin to assign 
>>>>>>>>>>> replicas. I
>>>>>>>>>>> see roles as more closely associated with kinds of functionality a 
>>>>>>>>>>> node is
>>>>>>>>>>> designated for. Therefore, I feel that replica placements and user 
>>>>>>>>>>> defined
>>>>>>>>>>> node labels is out of scope for this SIP. It can be added later in a
>>>>>>>>>>> separate SIP, without being at odds with this proposal.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan
>>>>>>>>>>> Høydahl <[email protected]> wrote:
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan
>>>>>>>>>>> Ginzburg <[email protected]>:
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>> > A node not having node.roles defined
>>>>>>>>>>> should be assumed to have all roles. Not only data. I don't see a 
>>>>>>>>>>> reason to
>>>>>>>>>>> special case this one or any role.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>> +1, make it simple and transparent. No
>>>>>>>>>>> role == all roles. Explicit list of roles = exactly those roles.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>> > (Gus) See my comment above, but maybe
>>>>>>>>>>> preference is something handled as a feature of the role rather 
>>>>>>>>>>> than via
>>>>>>>>>>> role designation?
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>> Yea, we always need an overseer, so that
>>>>>>>>>>> feature can decide to use its list of nodes as a preference if it so
>>>>>>>>>>> chooses.
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>> Aside: I think it makes it easier if we
>>>>>>>>>>> always prefix Solr env.vars and sys.props with "SOLR_" or "solr.", 
>>>>>>>>>>> i.e.
>>>>>>>>>>> -Dsolr.node.roles=foo. That way we can get away from having to have
>>>>>>>>>>> explicit code in bin/solr, bin/solr.cmd and SolrCLI to manage every 
>>>>>>>>>>> single
>>>>>>>>>>> property. Instead we can parse all ENVs and Props with the solr 
>>>>>>>>>>> prefix in
>>>>>>>>>>> our bootstrap code. And we can by convention allow e.g. docker run 
>>>>>>>>>>> -e
>>>>>>>>>>> SOLR_NODE_ROLES=foo solr:9 and it would be the same ting...
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>> Jan
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>> [email protected]
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>> [email protected]
>>>>>>>>>>> >>> >> >> >> >>>>>>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>>
>>>>>>>>>>> >>> >> >> >> >>>>>> --
>>>>>>>>>>> >>> >> >> >> >>>>>> http://www.needhamsoftware.com (work)
>>>>>>>>>>> >>> >> >> >> >>>>>> http://www.the111shift.com (play)
>>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>>> >>> >> >> >> >>>>
>>>>>>>>>>> >>> >> >> >> >>>> --
>>>>>>>>>>> >>> >> >> >> >>>> http://www.needhamsoftware.com (work)
>>>>>>>>>>> >>> >> >> >> >>>> http://www.the111shift.com (play)
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> >>> >> >> >> To unsubscribe, e-mail:
>>>>>>>>>>> [email protected]
>>>>>>>>>>> >>> >> >> >> For additional commands, e-mail:
>>>>>>>>>>> [email protected]
>>>>>>>>>>> >>> >> >> >>
>>>>>>>>>>> >>> >> >> >
>>>>>>>>>>> >>> >> >> >
>>>>>>>>>>> >>> >> >> > --
>>>>>>>>>>> >>> >> >> > http://www.needhamsoftware.com (work)
>>>>>>>>>>> >>> >> >> > http://www.the111shift.com (play)
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> >>> >> >> To unsubscribe, e-mail:
>>>>>>>>>>> [email protected]
>>>>>>>>>>> >>> >> >> For additional commands, e-mail:
>>>>>>>>>>> [email protected]
>>>>>>>>>>> >>> >> >>
>>>>>>>>>>> >>> >>
>>>>>>>>>>> >>> >>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> >>> >> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>> >>> >> For additional commands, e-mail: [email protected]
>>>>>>>>>>> >>> >>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> >>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>> >>> For additional commands, e-mail: [email protected]
>>>>>>>>>>> >>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> -----------------------------------------------------
>>>>>> Noble Paul
>>>>>>
>>>>>>
>>>>>>

Re: First class support for node roles

Reply via email to