Re: First class support for node roles

Michael Gibney Wed, 03 Nov 2021 09:51:25 -0700

>I actually didn't realize that an empty Solr node would forward the
top-level
>request onward instead of just being the query controller itself? That
>actually seems like a bug vs. a feature, IMO any node that receives
>the top-level query should just be the coordinator, what stops it?


+1 to Tim's statement quoted above; unless I'm missing something, this
feels like an issue that should be addressed regardless of this SIP.
(perhaps it would be addressed incidentally by this SIP? -- in any event
the current situation seems to not make sense. As Tim points out, the
relevant configs should in principle be accessible from ZK whether or not
there's a core for a given collection on a given node).

Considering the above, and especially given Ishan that you say "The
coordinator role is the biggest motivation for introducing the concept of
roles", while reading the SIP I found myself wishing for a fuller
enumeration of use cases, and a more sympathetic characterization of
alternatives (existing alternatives, and perhaps, as with the above "proxy
request" issue, simpler-but-not-yet-implemented alternatives).

Combining questions about use cases with questions about alternatives:
assuming that 9.x autoscaling can indeed be reliably used to stop replicas
from being placed on nodes, how close would addressing the orthogonal
"proxy request" issue come to addressing potential use cases?

Michael


On Wed, Nov 3, 2021 at 10:00 AM Ilan Ginzburg <[email protected]> wrote:

> I think if we have the new "pseudo core" abstraction (I like it! Will it
> really be a core with an index on disk or some new abstraction only tracked
> in ZK and in memory?) to play the role of coordinator, then we have all we
> need with the affinity placement plugin framework for a data free
> coordinator node implementation.
> It is easy to use system properties to exclude nodes from
> receiving replicas using the placement plugins, a minor change in the
> Affinity Placement Plugin. Such nodes will not receive any replicas by the
> placement plugin not even at startup (the system property will be assigned
> at startup so no manual intervention needed).
>
> It will not work if switching to another placement plugin, unless that
> other plugin reimplements that (simple) aspect. Is that an issue?
>
> Ilan
>
>
>
>
> On Wed, Nov 3, 2021 at 2:57 AM Ishan Chattopadhyaya <
> [email protected]> wrote:
>
>> Answers inline below.
>>
>> On Wed, Nov 3, 2021 at 5:56 AM Timothy Potter <[email protected]>
>> wrote:
>>
>>> One last thought on this for me ... I think it would be beneficial for
>>> the SIP to address how this new feature will work with the existing
>>> shards.preference solution and affinity based placement plugin.
>>>
>>
>> I was more inclined to keep this SIP focused on broad concept of roles,
>> and any upcoming roles (coordinator role, along with that pseudo-core
>> functionality) to be described in their own issue (e.g. SOLR-15715).
>>
>>
>>> Moreover, your pseudo-replica solution sounds like a new replica type
>>> vs. a node level thing.
>>
>>
>> I misspoke when I called it "pseudo replica", it is actually a "pseudo
>> core". Replicas are shard level concepts, but such a pseudo core that we
>> plan to introduce will pertain to one or more collections. Imagine
>> collection1 has shard1 and shard2, there will be a single pseudo core for
>> collection1 (we haven't decided on the prefix of this pseudo core yet, but
>> a candidate can be ".collection1_coordinator"). Replica type won't fit this
>> mental model here. We can discuss this more in the SOLR-15715 issue.
>>
>> The placement strategy can place replicas
>>> based on replica type and node type (just a system property), so
>>> please address why you can't achieve a query coordinator behavior with
>>> a new replica type + improvements to the Affinity placement plugin?
>>>
>>
>> To put down my thoughts on why Affinity placement plugin won't work for
>> the purpose of ensuring that we have nodes that host no data on it:
>> 1. We want the ability to have nodes with no data on it as a first class
>> concept for users. Hence, if the Affinity placement plugin is used for that
>> purpose, users won't be able to switch out that plugin and use anything of
>> their own. Currently, IIUC, there's not way for users to use multiple
>> placement plugins.
>> 2. Nodes that shouldn't host any replica on it are generally ephemeral in
>> nature; many of them may join the cluster, they may go away. If such a node
>> joins the cluster, they immediately become eligible for replica placement,
>> before even the sysadmin is able to assign an affinity placement
>> configuration for that node. This is a problem.
>>
>>
>>> Cheers,
>>> Tim
>>>
>>
>> Thanks for your thoughts and feedback, I think it will help us put
>> together the document with more insights into our design choices.
>>
>> Regards,
>> Ishan
>>
>>
>>>
>>> On Tue, Nov 2, 2021 at 6:14 PM Ishan Chattopadhyaya
>>> <[email protected]> wrote:
>>> >
>>> > Also, in a cluster where new collections/shards/replicas are
>>> continuously added all the time, it would be pretty awkward to start a node
>>> (in regular mode), briefly have it become eligible for replica assignment,
>>> then invoking a replica placement rule/autoscaling policy for that node to
>>> not place replicas on it. Instead, starting a node with a defined role (as
>>> a startup param) precludes that brief period of eligibility for replica
>>> placement on such a node.
>>> >
>>> > On Wed, Nov 3, 2021 at 5:39 AM Ishan Chattopadhyaya <
>>> [email protected]> wrote:
>>> >>
>>> >> If we were to tell users how to do "scatter gather on an empty node",
>>> *how exactly* would you recommend users have an empty node to begin with?
>>> Wouldn't you say something like "for 8x you can do this (rule based replica
>>> placement) or do that (autoscaling), but for 9x you do this new thing".
>>> Having a node that doesn't have a data role seems like a consistent and an
>>> elegant way for users to invoke such a functionality and also easily relate
>>> to a broad concept, without having to deal with autoscaling frameworks of
>>> the ancient past, medieval past or the future.
>>> >>
>>> >> On Wed, Nov 3, 2021 at 5:29 AM Timothy Potter <[email protected]>
>>> wrote:
>>> >>>
>>> >>> As opposed to what? Looking up the configset for the addressed
>>> >>> collection and pulling whatever information it needs from cached
>>> data.
>>> >>> I'm sure there are some nuances but I hardly think you need a node
>>> >>> role framework to deal with determine the unique key field to do
>>> >>> scatter gather on an empty node when you have easy access to
>>> >>> collection metadata.
>>> >>>
>>> >>> Doesn't seem like a hard thing to overcome to me.
>>> >>>
>>> >>> On Tue, Nov 2, 2021 at 5:49 PM Noble Paul <[email protected]>
>>> wrote:
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > On Wed, Nov 3, 2021, 10:46 AM Timothy Potter <[email protected]>
>>> wrote:
>>> >>> >>
>>> >>> >> I'm not missing the point of the query coordinator, but I actually
>>> >>> >> didn't realize that an empty Solr node would forward the top-level
>>> >>> >> request onward instead of just being the query controller itself?
>>> That
>>> >>> >> actually seems like a bug vs. a feature, IMO any node that
>>> receives
>>> >>> >> the top-level query should just be the coordinator, what stops it?
>>> >>> >
>>> >>> >
>>> >>> > To process a request there should be a core that uses the same
>>> configset as the requested collection.
>>> >>> >>
>>> >>> >>
>>> >>> >> Anyway, it sounds to me like you guys have your minds made up
>>> >>> >> regardless of feedback.
>>> >>> >>
>>> >>> >> Btw ~ I only mentioned the Zookeeper part b/c it's in your SIP as
>>> a
>>> >>> >> specific role, not sure why you took that as me wanting to
>>> discuss the
>>> >>> >> embedded ZK in your SIP?
>>> >>> >>
>>> >>> >> On Tue, Nov 2, 2021 at 5:13 PM Ishan Chattopadhyaya
>>> >>> >> <[email protected]> wrote:
>>> >>> >> >
>>> >>> >> > Hi Tim,
>>> >>> >> > Here are my responses inline.
>>> >>> >> >
>>> >>> >> > On Wed, Nov 3, 2021 at 3:22 AM Timothy Potter <
>>> [email protected]> wrote:
>>> >>> >> >>
>>> >>> >> >> I'm just not convinced this feature is even needed and the SIP
>>> is not
>>> >>> >> >> convincing that "There is no proper alternative today."
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > There are no proper alternatives today, just hacks. On 8x, we
>>> have two different deprecated frameworks to stop nodes from being placed on
>>> a node (1. rule based replica placement, 2. autoscaling framework). On 9x,
>>> we have a new autoscaling framework, which I don't even think is fully
>>> implemented. And, there's definitely no way to have a node act as a query
>>> coordinator without having data on it.
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> 1) Just b/c Elastic and Vespa have a concept of node roles,
>>> doesn't
>>> >>> >> >> mean Solr needs this.
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > Solr needs this. Elastic has such concepts is a coincidence,
>>> and also means we have an opportunity to catch up with them; they have
>>> these concepts for a reason.
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >> Also, some of Elastic's roles overlap with
>>> >>> >> >> concepts Solr already has in a different form, i.e data_hot
>>> sounds
>>> >>> >> >> like NRT and data_warm sounds a lot like our Pull Replica Type
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > I think that is beyond the scope of this SIP.
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> 2) You can achieve the "coordinator" role with auto-scaling
>>> rules
>>> >>> >> >> pre-9.x and with the AffinityPlacementPlugin (heck, it even
>>> has a node
>>> >>> >> >> type built in:
>>> .requestNodeSystemProperty(AffinityPlacementConfig.NODE_TYPE_SYSPROP).
>>> >>> >> >> Simply build your replica placement rules such that no
>>> replicas land
>>> >>> >> >> on "coordinator" nodes. And you can route queries using
>>> node.sysprop
>>> >>> >> >> already using shards.preference.
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > I think you missed the whole point of the query coordinator.
>>> Please refer to this https://issues.apache.org/jira/browse/SOLR-15715.
>>> >>> >> > Let me summarize the main difference between what (I think) you
>>> refer to and what is proposed in SOLR-15715.
>>> >>> >> >
>>> >>> >> > With your suggestion, we'll have a node that doesn't host any
>>> replicas. And you suggest queries landing on such nodes be routed using
>>> shards.preference? Well, in such a case, these queries will be
>>> forwarded/proxied to a random node hosting a replica of the collection and
>>> that node then acts as the coordinator. This situation is no better than
>>> sending the query directly to that particular node.
>>> >>> >> >
>>> >>> >> > What is proposed in SOLR-15715 is a query aggregation
>>> functionality. There will be pseudo replicas (aware of the configset) on
>>> this coordinator node that handle the request themselves, sends shard
>>> requests to data hosting replicas, collects responses and merges them, and
>>> sends back to the user. This merge step is usually extremely memory
>>> intensive, and it would be good to serve these off stateless nodes (that
>>> host no data).
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> 3) Dedicated overseer role? I thought we were removing the
>>> overseer?!?
>>> >>> >> >> Also, we already have the ability to run the overseer on
>>> specific
>>> >>> >> >> nodes w/o a new framework, so this doesn't really convince me
>>> we need
>>> >>> >> >> a new framework.
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > There's absolutely no change proposed to the "overseer" role.
>>> What users need on production clusters are nodes dedicated for overseer
>>> operations, and for that the current "overseer" role suffices, together
>>> with some functionality to not place replicas on such nodes.
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> 4) We will indeed need to decide which nodes host embedded
>>> Zookeeper's
>>> >>> >> >> but I'd argue that solution hasn't been designed entirely and
>>> we
>>> >>> >> >> probably don't need a formal node role framework to determine
>>> which
>>> >>> >> >> nodes host embedded ZKs. Moreover, embedded ZK seems more like
>>> a small
>>> >>> >> >> cluster thing and anyone running a large cluster will probably
>>> have a
>>> >>> >> >> dedicated ZK ensemble as they do today. The node role thing
>>> seems like
>>> >>> >> >> it's intended for large clusters and my gut says few will use
>>> embedded
>>> >>> >> >> ZK for large clusters.
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > This SIP is not the right place for this discussion. There's a
>>> separate SIP for this.
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> 5) You can also achieve a lot of "node role" functionality in
>>> query
>>> >>> >> >> routing using the shards.preference parameter.
>>> >>> >> >>
>>> >>> >> >
>>> >>> >> > That doesn't solve the purpose behind
>>> https://issues.apache.org/jira/browse/SOLR-15715.
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >> At the very least, the SIP needs to list specific use cases
>>> that
>>> >>> >> >> require this feature that are not achievable with the current
>>> features
>>> >>> >> >> before getting bogged down in the impl. details.
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > The coordinator role is the biggest motivation for introducing
>>> the concept of roles. However, in addition to what is proposed in
>>> SOLR-15715, a coordinator node can later on also be used as a node for
>>> users to run streaming expressions on, do bulk indexing on (impl details
>>> for this to come later, don't want distraction here).
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> Tim
>>> >>> >> >>
>>> >>> >> >> On Tue, Nov 2, 2021 at 3:20 PM Gus Heck <[email protected]>
>>> wrote:
>>> >>> >> >> >
>>> >>> >> >> > I think there are things not yet accounted for. Time I spent
>>> yesterday is biting me today. Pls give a couple days.
>>> >>> >> >> >
>>> >>> >> >> > On Tue, Nov 2, 2021 at 11:28 AM Jason Gerlowski <
>>> [email protected]> wrote:
>>> >>> >> >> >>
>>> >>> >> >> >> Hey Ishan,
>>> >>> >> >> >>
>>> >>> >> >> >> I appreciate you writing up the SIP!  Here's some
>>> notes/questions I
>>> >>> >> >> >> had as I was reading through your writeup and this mail
>>> thread.
>>> >>> >> >> >> ("----" separators between thoughts, hopefully that helps.)
>>> >>> >> >> >>
>>> >>> >> >> >> ----
>>> >>> >> >> >>
>>> >>> >> >> >> I'll add my vote to what Jan, Gus, Ilan, and Houston already
>>> >>> >> >> >> suggested: roles should default to "all-on".  I see the
>>> downsides
>>> >>> >> >> >> you're worried about with that approach (esp. around
>>> 'overseer'), but
>>> >>> >> >> >> they may be mitigatable, at least in part.
>>> >>> >> >> >>
>>> >>> >> >> >> > [mail thread] User wants this node Solr101 to be a
>>> dedicated overseer, but for that to happen, he/she would need to restart
>>> all the data nodes with -Dnode.roles=data
>>> >>> >> >> >>
>>> >>> >> >> >> Sure, if roles can only be specified at startup.  But that
>>> may be a
>>> >>> >> >> >> self-imposed constraint.
>>> >>> >> >> >>
>>> >>> >> >> >> An API to change a node's roles would remove the need for a
>>> restart
>>> >>> >> >> >> and make it easy for users to affect the semantics they
>>> want.  You
>>> >>> >> >> >> decided you want a dedicated overseer N nodes into your
>>> cluster
>>> >>> >> >> >> deployment?  Deploy node 'N' with the 'overseer', and
>>> toggle the
>>> >>> >> >> >> overseer role off on the remainder.
>>> >>> >> >> >>
>>> >>> >> >> >> Now, I understand that you don't want roles to change at
>>> runtime, but
>>> >>> >> >> >> I haven't seen you get much into "why", beyond saying "it
>>> is very
>>> >>> >> >> >> risky to have nodes change roles while they are up and
>>> running."  Can
>>> >>> >> >> >> you expand a bit on the risks you're worried about?  If
>>> you're
>>> >>> >> >> >> explicit about them here maybe someone can think of a
>>> clever way to
>>> >>> >> >> >> address them?
>>> >>> >> >> >>
>>> >>> >> >> >> > Hence, if those nodes are "assumed to have all roles",
>>> then just by virtue of upgrading to this new version, new capabilities will
>>> be turned on for the entire cluster, whether or not the user opted for such
>>> a capability. This is totally undesirable.
>>> >>> >> >> >>
>>> >>> >> >> >> Obviously "roles" refer to much bigger chunks of
>>> functionality than
>>> >>> >> >> >> usual, so in a sense defaulting roles on is scarier.  But
>>> in a sense
>>> >>> >> >> >> you're describing something that's an inherent part of
>>> software
>>> >>> >> >> >> releases.  Releases expose new features that are typically
>>> on by
>>> >>> >> >> >> default.  A new default-on role in 9.1 might hurt a user,
>>> but there's
>>> >>> >> >> >> no fundamental difference between that and a change to
>>> backups or
>>> >>> >> >> >> replication or whatever in the same release.
>>> >>> >> >> >>
>>> >>> >> >> >> I don't mean to belittle the difference in scope - I get
>>> your concern.
>>> >>> >> >> >> But IMO this is something to address with good release
>>> notes and
>>> >>> >> >> >> documentation.  Designing for admins who don't do even
>>> cursory
>>> >>> >> >> >> research before an upgrade ties both our hands behind our
>>> back as a
>>> >>> >> >> >> project.
>>> >>> >> >> >>
>>> >>> >> >> >> ----
>>> >>> >> >> >>
>>> >>> >> >> >> > [SIP] Internal representation in ZK ... Implementation
>>> details like these can be fleshed out in the PR
>>> >>> >> >> >>
>>> >>> >> >> >> IMO this is important enough to flush out as part of the
>>> SIP, at least
>>> >>> >> >> >> in broad strokes.  It affects backcompat, SolrJ client
>>> design, etc.
>>> >>> >> >> >>
>>> >>> >> >> >> ----
>>> >>> >> >> >>
>>> >>> >> >> >> > [SIP] GET /api/cluster/roles?node=node1
>>> >>> >> >> >>
>>> >>> >> >> >> Woohoo - way to include a v2 API definition!
>>> >>> >> >> >>
>>> >>> >> >> >> AFAIR, the v2 API has a /nodes path defined - I wonder
>>> whether "GET
>>> >>> >> >> >> /nodes/someNode/roles" wouldn't be a more intuitive
>>> endpoint for the
>>> >>> >> >> >> "get the roles this node has" functionality.  Though I
>>> leave that for
>>> >>> >> >> >> your consideration.
>>> >>> >> >> >>
>>> >>> >> >> >> ----
>>> >>> >> >> >>
>>> >>> >> >> >> Looking forward to your responses and seeing the SIP
>>> progress!  It's a
>>> >>> >> >> >> really cool, promising idea IMO.
>>> >>> >> >> >>
>>> >>> >> >> >> Best,
>>> >>> >> >> >>
>>> >>> >> >> >> Jason
>>> >>> >> >> >>
>>> >>> >> >> >> On Tue, Nov 2, 2021 at 11:21 AM Ishan Chattopadhyaya
>>> >>> >> >> >> <[email protected]> wrote:
>>> >>> >> >> >> >
>>> >>> >> >> >> > Are there any unaddressed outstanding concerns that we
>>> should hold up the SIP for?
>>> >>> >> >> >> >
>>> >>> >> >> >> > On Mon, 1 Nov, 2021, 10:31 pm Ishan Chattopadhyaya, <
>>> [email protected]> wrote:
>>> >>> >> >> >> >>>
>>> >>> >> >> >> >>> >> Agree. However, I disagree with ideas where "query
>>> analysis" has a role of its own. Where would that lead us to? Separate
>>> roles for
>>> >>> >> >> >> >>>
>>> >>> >> >> >> >>> >> nodes that do "faceting" or "spell correction" etc.?
>>> But anyway, that is for discussion when we add future roles. This is beyond
>>> this SIP.
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> > I am not asking you to implement every possible role
>>> of course :). As a note I know a company that is running an entire separate
>>> >>> >> >> >> >> > cluster to offload and better serve highlighting on a
>>> subset of large docs, so YES I think there are people who may want such
>>> fine grained control.
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> Cool, I think we can discuss adding any additional roles
>>> (for highlighting?) on a case by case basis at a later point.
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> On Mon, Nov 1, 2021 at 10:25 PM Ishan Chattopadhyaya <
>>> [email protected]> wrote:
>>> >>> >> >> >> >>>
>>> >>> >> >> >> >>> > Boiling it down the idea I'm proposing is that roles
>>> required for back compatibility get explicitly added on startup, if not by
>>> the user then by the code. This is more flexible than assuming that no role
>>> means every role, because then every new feature that has a role will end
>>> up on legacy clusters which are also not back compatible.
>>> >>> >> >> >> >>>
>>> >>> >> >> >> >>> +1, I totally agree. I even said so, when I said: "This
>>> is why I was advocating that 1) we assume the "data" as a default, 2) not
>>> assume overseer to be implicitly defined (because of the way overseer role
>>> is written today), 3) not assume any future roles to be true by default."
>>> >>> >> >> >> >>>
>>> >>> >> >> >> >>> So, basically, I'm proposing that the "roles required
>>> for back compatibility" (that should be explicitly added on startup) be
>>> just the ["data"] role, and not the "overseer" role (due to the way
>>> overseer role is currently defined, i.e. it is "preferred overseer").
>>> >>> >> >> >> >>>
>>> >>> >> >> >> >>> On Mon, Nov 1, 2021 at 10:19 PM Gus Heck <
>>> [email protected]> wrote:
>>> >>> >> >> >> >>>>
>>> >>> >> >> >> >>>> Very sorry don't mean to sound offended, Frustrated
>>> yes offended no :)... the most difficult thing about communication is the
>>> illusion it has occurred :)
>>> >>> >> >> >> >>>>
>>> >>> >> >> >> >>>> If you read back just a few emails you'll see where I
>>> talk about roles being applied on startup. Boiling it down the idea I'm
>>> proposing is that roles required for back compatibility get explicitly
>>> added on startup, if not by the user then by the code. This is more
>>> flexible than assuming that no role means every role, because then every
>>> new feature that has a role will end up on legacy clusters which are also
>>> not back compatible.
>>> >>> >> >> >> >>>>
>>> >>> >> >> >> >>>> There are points where I said all roles rather than
>>> back compatibility roles because I was thinking about back compatibility
>>> specifically, but you can't know that if I don't say that can you :).
>>> >>> >> >> >> >>>>
>>> >>> >> >> >> >>>> On Mon, Nov 1, 2021 at 12:39 PM Ishan Chattopadhyaya <
>>> [email protected]> wrote:
>>> >>> >> >> >> >>>>>
>>> >>> >> >> >> >>>>> > If you read more closely, my way can provide full
>>> back compatibility. To say or imply it doesn't isn't helping. Perhaps you
>>> need to re-read?
>>> >>> >> >> >> >>>>>
>>> >>> >> >> >> >>>>> I understand e-mails are frustrating, and I'm trying
>>> my best. Please don't be offended, and kindly point me to the exact part
>>> you want me to re-read.
>>> >>> >> >> >> >>>>>
>>> >>> >> >> >> >>>>> On Mon, Nov 1, 2021 at 10:05 PM Gus Heck <
>>> [email protected]> wrote:
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>> On Mon, Nov 1, 2021 at 12:22 PM Ishan Chattopadhyaya
>>> <[email protected]> wrote:
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> >    Positive - They denote the existence of a
>>> capability
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> Agree, the SIP already reflects this.
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> >   Absolute - Absence/Presence binary
>>> identification of a capability; no implications, no assumptions
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> Disagree, we need backcompat handling on nodes
>>> running without any roles. There has to be an implicit assumption as to
>>> what roles are those nodes assumed to have. My proposal is that only the
>>> "data" role be assumed, but not the "overseer" role. For any future roles
>>> ("coordinator", "zookeeper" etc.), this decision as to what absence of any
>>> role implies should be left to the implementation of that future role.
>>> Documentation should reflect clearly about these implicit assumptions.
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>> If you read more closely, my way can provide full
>>> back compatibility. To say or imply it doesn't isn't helping. Perhaps you
>>> need to re-read?
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> >    Focused - Do one thing per role
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> Agree. However, I disagree with ideas where "query
>>> analysis" has a role of its own. Where would that lead us to? Separate
>>> roles for nodes that do "faceting" or "spell correction" etc.? But anyway,
>>> that is for discussion when we add future roles. This is beyond this SIP.
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>> I am not asking you to implement every possible role
>>> of course :). As a note I know a company that is running an entire separate
>>> cluster to offload and better serve highlighting on a subset of large docs,
>>> so YES I think there are people who may want such fine grained control.
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> >    Accessible - It should be dead simple to
>>> determine the members of a role, avoid parsing blobs of json, avoid
>>> calculating implications, avoid consulting other resources after listing
>>> nodes with the role
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> Agree. I'm open to any implementation details that
>>> make it easy. There should be a reasonable API to return these node roles,
>>> with ability to filter by role or filter by node.
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> >    Independent - One role should not require
>>> other roles to be present
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> Do we need to have this hard and fast requirement
>>> upfront? There might be situations where this is desirable. I feel we can
>>> discuss on a case by case basis whenever a future role is added.
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> >    Persistent - roles should not be lost across
>>> reboot
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> Agree.
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> >    Immutable - roles should not change while the
>>> node is running
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> Agree
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> >    Lively - A node with a capability may not be
>>> presently providing that capability.
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> I don't understand, can you please elaborate?
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>> Specifically imagine the case where there are 100
>>> nodes:
>>> >>> >> >> >> >>>>>> 1-100 ==> DATA
>>> >>> >> >> >> >>>>>> 101-103 ==> OVERSEER
>>> >>> >> >> >> >>>>>> 104-106 ==> ZOOKEEPER
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>> But you won't have 3 overseers... you'll want only
>>> one of those to be providing overseer functionality and the other two to be
>>> capable, but not providing (so that if the current overseer goes down a new
>>> one can be assigned).
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>> Then you decide you'd ike 5 Zookeepers. You start
>>> nodes 107-108 with that role, but you probably want to ensure that
>>> zookeepers require some sort of command for them to actually join the
>>> zookeeper cluster (i.e. /admin?action=ZKADD&nodes=node107,node18) ... to do
>>> that the nodes need to be up. But oh look I typoed 108... we want that to
>>> fail... how? because 18 does not have the capability to become a zookeeper.
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>>
>>> >>> >> >> >> >>>>>>> On Mon, Nov 1, 2021 at 9:30 PM Ishan Chattopadhyaya
>>> <[email protected]> wrote:
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> > Ilan: A node not having node.roles defined
>>> should be assumed to have all roles. Not only data. I don't see a reason to
>>> special case this one or any role.
>>> >>> >> >> >> >>>>>>>> > Gus: There should be no "assumptions" Nothing to
>>> figure out. A node has a role or not. For back compatibility reasons, all
>>> roles would be assumed on startup if none specified.
>>> >>> >> >> >> >>>>>>>> > Jan: No role == all roles. Explicit list of
>>> roles = exactly those roles.
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> Problem with this approach is mainly to do with
>>> backcompat.
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> 1. Overseer backcompat:
>>> >>> >> >> >> >>>>>>>> If we don't make any modifications to how overseer
>>> works and adopt this approach (as quoted), then imagine this situation:
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> Solr1-100: No roles param (assumed to be
>>> "data,overseer").
>>> >>> >> >> >> >>>>>>>> Solr101: -Dnode.roles=overseer (intention:
>>> dedicated overseer)
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> User wants this node Solr101 to be a dedicated
>>> overseer, but for that to happen, he/she would need to restart all the data
>>> nodes with -Dnode.roles=data. This will cause unnecessary disruption to
>>> running clusters where a dedicated overseer is needed. Keep in mind, if a
>>> user needs a dedicated overseer, he's likely in an emergency situation and
>>> restarting the whole cluster might not be viable for him/her.
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> 2. Future roles might not be compatible with this
>>> "assumed to have all roles" idea:
>>> >>> >> >> >> >>>>>>>> Take the proposed "zookeeper" role for example.
>>> Today, regular nodes are not supposed to have embedded ZK running on them.
>>> By introducing this artificial limitation ("assumed to have all roles"), we
>>> constrain adoption of all future roles to necessarily require a full
>>> cluster restart.
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> Keep in mind newer Solr versions can introduce new
>>> capabilities and roles. Imagine we have a role that is defined in a new
>>> Solr version (and there's functionality to go with that role), and user
>>> upgrades to that version. However, his/her nodes all were started with no
>>> node.roles param. Hence, if those nodes are "assumed to have all roles",
>>> then just by virtue of upgrading to this new version, new capabilities will
>>> be turned on for the entire cluster, whether or not the user opted for such
>>> a capability. This is totally undesirable.
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> > Gus: I actually don't want a coordinator to do
>>> more work, I would prefer small focused roles with names that accurately
>>> describe their function. In that light, COORDINATOR might be too nebulous.
>>> How about AGREGATOR role? (what I was thinking of would better be called a
>>> QUERY_ANALYSIS role)
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> If you want to do specific things like query
>>> analysis or query aggregation or bulk indexing etc, all of those can be
>>> done on COORDINATOR nodes (as is the case in ElasticSearch). Having tens of
>>> of " small focused roles" defined as first class concepts would be
>>> confusing to the user. As a remedy to your situation where you want the
>>> coordinator role to also do query-analysis for shards, one possible
>>> solution is to send such a query to a coordinator node with a parameter
>>> like "coordinator.query_analysis=true", and then the coordinator, instead
>>> of blindly hitting remote shards, also does some extra work on behalf of
>>> the shards.
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>>
>>> >>> >> >> >> >>>>>>>> On Mon, Nov 1, 2021 at 9:01 PM Ishan
>>> Chattopadhyaya <[email protected]> wrote:
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>> > If we make collections role-aware for example
>>> (replicas of that collection can only be
>>> >>> >> >> >> >>>>>>>>> > placed on nodes with a specific role, in
>>> addition to the other role based constraints),
>>> >>> >> >> >> >>>>>>>>> > the set of roles should be user extensible and
>>> not fixed.
>>> >>> >> >> >> >>>>>>>>> > If collections are not role aware, the
>>> constraints introduced by roles apply to all collections
>>> >>> >> >> >> >>>>>>>>> > equally which might be insufficient if a user
>>> needs for example a heavily used collection to
>>> >>> >> >> >> >>>>>>>>> > only be placed on more powerful nodes.
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>> I feel node roles and role-aware collections are
>>> orthogonal topics. What you describe above can be achieved by the
>>> autoscaling+replica placement framework where the placement plugins take
>>> the node roles as one of the inputs.
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>> > It does impact the design from early on: the
>>> set of roles need to be expandable by a user
>>> >>> >> >> >> >>>>>>>>> > by creating a collection with new roles for
>>> example (consumed by placement plugins) and be
>>> >>> >> >> >> >>>>>>>>> > able to start nodes with new (arbitrary) roles.
>>> Should such roles follow some naming syntax to
>>> >>> >> >> >> >>>>>>>>> > differentiate them from built in roles? To be
>>> able to fail on typos on roles - that otherwise can be
>>> >>> >> >> >> >>>>>>>>> > crippling and hard to debug. This implies in
>>> any case that the current design can't assume all
>>> >>> >> >> >> >>>>>>>>> > roles are known at compile time or define them
>>> in a Java enum.
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>> I think this should be achieved by something
>>> different from roles. Something like node labels (user defined) which can
>>> then be used in a replica placement plugin to assign replicas. I see roles
>>> as more closely associated with kinds of functionality a node is designated
>>> for. Therefore, I feel that replica placements and user defined node labels
>>> is out of scope for this SIP. It can be added later in a separate SIP,
>>> without being at odds with this proposal.
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>
>>> >>> >> >> >> >>>>>>>>> On Mon, Nov 1, 2021 at 8:42 PM Jan Høydahl <
>>> [email protected]> wrote:
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>> > 1. nov. 2021 kl. 14:46 skrev Ilan Ginzburg <
>>> [email protected]>:
>>> >>> >> >> >> >>>>>>>>>> > A node not having node.roles defined should be
>>> assumed to have all roles. Not only data. I don't see a reason to special
>>> case this one or any role.
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>> +1, make it simple and transparent. No role ==
>>> all roles. Explicit list of roles = exactly those roles.
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>> > (Gus) See my comment above, but maybe
>>> preference is something handled as a feature of the role rather than via
>>> role designation?
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>> Yea, we always need an overseer, so that feature
>>> can decide to use its list of nodes as a preference if it so chooses.
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>> Aside: I think it makes it easier if we always
>>> prefix Solr env.vars and sys.props with "SOLR_" or "solr.", i.e.
>>> -Dsolr.node.roles=foo. That way we can get away from having to have
>>> explicit code in bin/solr, bin/solr.cmd and SolrCLI to manage every single
>>> property. Instead we can parse all ENVs and Props with the solr prefix in
>>> our bootstrap code. And we can by convention allow e.g. docker run -e
>>> SOLR_NODE_ROLES=foo solr:9 and it would be the same ting...
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>>>>> Jan
>>> >>> >> >> >> >>>>>>>>>>
>>> ---------------------------------------------------------------------
>>> >>> >> >> >> >>>>>>>>>> To unsubscribe, e-mail:
>>> [email protected]
>>> >>> >> >> >> >>>>>>>>>> For additional commands, e-mail:
>>> [email protected]
>>> >>> >> >> >> >>>>>>>>>>
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>>
>>> >>> >> >> >> >>>>>> --
>>> >>> >> >> >> >>>>>> http://www.needhamsoftware.com (work)
>>> >>> >> >> >> >>>>>> http://www.the111shift.com (play)
>>> >>> >> >> >> >>>>
>>> >>> >> >> >> >>>>
>>> >>> >> >> >> >>>>
>>> >>> >> >> >> >>>> --
>>> >>> >> >> >> >>>> http://www.needhamsoftware.com (work)
>>> >>> >> >> >> >>>> http://www.the111shift.com (play)
>>> >>> >> >> >>
>>> >>> >> >> >>
>>> ---------------------------------------------------------------------
>>> >>> >> >> >> To unsubscribe, e-mail: [email protected]
>>> >>> >> >> >> For additional commands, e-mail: [email protected]
>>> >>> >> >> >>
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > --
>>> >>> >> >> > http://www.needhamsoftware.com (work)
>>> >>> >> >> > http://www.the111shift.com (play)
>>> >>> >> >>
>>> >>> >> >>
>>> ---------------------------------------------------------------------
>>> >>> >> >> To unsubscribe, e-mail: [email protected]
>>> >>> >> >> For additional commands, e-mail: [email protected]
>>> >>> >> >>
>>> >>> >>
>>> >>> >>
>>> ---------------------------------------------------------------------
>>> >>> >> To unsubscribe, e-mail: [email protected]
>>> >>> >> For additional commands, e-mail: [email protected]
>>> >>> >>
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: [email protected]
>>> >>> For additional commands, e-mail: [email protected]
>>> >>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>

Re: First class support for node roles

Reply via email to