Re: First class support for node roles

Ishan Chattopadhyaya Fri, 29 Oct 2021 02:22:46 -0700

It seems to me, after going through this thread, that the role "query" is
misleading for what we're planning to introduce in SOLR-15715. We're
essentially introducing a functionality for performing, what we initially
called, "query aggregations". The actual queries run on data nodes anyway,
just that the first point of entry for such distributed queries will be a
separate node with this extra functionality. Towards that, I feel we should
call a node with such capability as a "coordinator" node (instead of "query
node"). It fits nicely with the mental model of ElasticSearch as well:
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#coordinating-node
.


Proposing that if a node has a role "coordinator", then that node is
already assumed to have no data replicas on it. If a node starts with roles
"coordinator,data" both, then the startup should fail with a message saying
a coordinator node should not host data on it. A coordinator node, though,
can have other roles like "zookeeper" or "overseer" etc.

I'll introduce a change to the SIP document, unless there are
objections/questions/concerns. WDYT?



On Fri, Oct 29, 2021 at 1:54 PM Ilan Ginzburg <[email protected]> wrote:

> If we make collections role-aware for example (replicas of that collection
> can only be placed on nodes with a specific role, in addition to the other
> role based constraints), the set of roles should be user extensible and not
> fixed.
>
> If collections are not role aware, the constraints introduced by roles
> apply to all collections equally which might be insufficient if a user
> needs for example a heavily used collection to only be placed on more
> powerful nodes.
>
> Ilan
>
> On Thu 28 Oct 2021 at 07:59, Gus Heck <[email protected]> wrote:
>
>>
>>
>> On Wed, Oct 27, 2021 at 3:34 PM Houston Putman <[email protected]>
>> wrote:
>>
>>> I don't think it's unreasonable to want to have nodes that don't accept
>>>> queries. This is just ishan's use case.
>>>
>>>
>>> Maybe I am misunderstanding, and it does deal with your last point about
>>> inter-node communication, but Peer-sync uses queries when doing replication
>>> between replicas. If a node doesn't have queries enabled, then it's
>>> possible to break peer sync. There are other options to make sure certain
>>> replicas aren't queried (shards.preference).
>>> For the separation of update/query traffic, I understand having compute
>>> nodes that deal with pre-replica commands, such as managing distributed
>>> queries or pre-processing documents in the URP chain. But for actual
>>> non-distrib queries and final update requests, the only way to actually
>>> separate this traffic is using PULL/TLOG replicas, because otherwise (with
>>> NRT) all update requests are still going to the query nodes, just the same
>>> as non-query nodes that are hosting those replicas. (and shard leadership
>>> could go to any "data" node, since I imagine we wouldn't filter out the
>>> "query" nodes...) The shards.preference option makes it easy to send
>>> queries to only PULL replicas in this scenario.
>>> That's why I saw this more as a "compute" role rather than "query".
>>>
>>
>> Yeah for internal stuff we still need the ability to query so we might
>> need to accommodate that that, but you may not have noticed the bit where I
>> mentioned Query nodes doing the parsing/analysis of the query and then
>> sending a fully parsed (possibly serialized lucene objects) query to the
>> data node. So data nodes would only speak a single lucene level query
>> language and not parse queries or analyze text. In theory, with that, one
>> could even have something like elastic reduce a request to lucene objects
>> and then get an answer from a solr data node (for simple cases without need
>> to find shards via zookeeper etc) certainly many details and corner cases
>> to think about there.
>>
>>
>>>
>>> Definitely not what I would like. If I'm going to try to segregate data
>>>> nodes out to certain nodes, I don't want them appearing elsewhere just
>>>> cause something went down or filled up. Nor would I want updates to
>>>> suddenly start bogging down my query nodes....
>>>>
>>>
>>> I guess it depends on what you are optimizing for. Maybe there can be an
>>> option for this. like -DnonLenientRoles=data,update or something like that.
>>>
>>
>> A possibility
>>
>>
>>>
>>> On Wed, Oct 27, 2021 at 3:03 PM Gus Heck <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Oct 27, 2021 at 2:44 PM Houston Putman <[email protected]>
>>>> wrote:
>>>>
>>>>> As for the "query" role, let's name it something better like
>>>>> "compute", since data nodes are always going to be "querying".
>>>>>
>>>>
>>>> I don't think it's unreasonable to want to have nodes that don't accept
>>>> queries. This is just ishan's use case.
>>>>
>>>>
>>>>>  if no live nodes have roles=overseer (or roles=all), then we should
>>>>> just select any node to be overseer. This should be the same for compute,
>>>>> data, etc.
>>>>>
>>>>
>>>> Definitely not what I would like. If I'm going to try to segregate data
>>>> nodes out to certain nodes, I don't want them appearing elsewhere just
>>>> cause something went down or filled up. Nor would I want updates to
>>>> suddenly start bogging down my query nodes....
>>>>
>>>>
>>>>>
>>>>> So, for the proposal, lets say "data" is a special role which is
>>>>>> assumed by default, and is enabled on all nodes unless there's a !data.
>>>>>>
>>>>>
>>>>> Instead of  this, maybe we have role groups. Such as
>>>>> admin~=overseer,zk or worker~=compute,data,updateProcessing
>>>>>
>>>>
>>>> Roll groups sounds like a next level feature to be built on top once we
>>>> figure out how roles work independently.
>>>>
>>>>
>>>>>
>>>>> As for the suggested Roles, I'm not sure ADMIN or UI really fit, since
>>>>> there is another option to disable the UI for a solr node, and various
>>>>> ADMIN commands have to be accepted across other node roles. (Data nodes
>>>>> require the Collections API, same with the overseer.)
>>>>>
>>>>
>>>> I admit I'm angling towards a world in which enabling and disabling
>>>> broad features is done in one way in one place... As for admin there might
>>>> be a distinction between commands issued between nodes and from the outside
>>>> world... I have this other idea about inter-node communication that's even
>>>> less baked that I wont distract with here ;)
>>>>
>>>>
>>>>> - Houston
>>>>>
>>>>> On Wed, Oct 27, 2021 at 1:34 PM Ishan Chattopadhyaya <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> bq. In other words, roles are all "positive", but their consequences
>>>>>> are only negative (rejecting when the matching positive role is not
>>>>>> present).
>>>>>>
>>>>>> Essentially, yes. A node that doesn't specify any role should be able
>>>>>> to do everything.
>>>>>>
>>>>>> Let me just take a brief detour and mention our thoughts on the
>>>>>> "query" role. While all data nodes can also be used for querying, our 
>>>>>> idea
>>>>>> was to create a layer of nodes that have some special mechanism to be 
>>>>>> able
>>>>>> to proxy/forward queries to data nodes (lets call it "pseudo cores" or
>>>>>> "synthetic cores" or "proxy cores". Our thought was that any node that 
>>>>>> has
>>>>>> "query,!data" role would enable this special mode on startup (whereby
>>>>>> requests are served by these special pseudo cores). We'll discuss about
>>>>>> this in detail in that issue.
>>>>>>
>>>>>> Back to the main subject here.
>>>>>>
>>>>>> Lets take a practical scenario:
>>>>>> * Layer1: Organization has about 100 nodes, each node has many data
>>>>>> replicas
>>>>>> * Layer2: To manage such a large cluster reliably, they keep aside
>>>>>> 4-5 dedicated overseer nodes.
>>>>>> * Layer3: Since query aggregations/coordination can potentially be
>>>>>> expensive, they keep aside 5-10 query nodes.
>>>>>>
>>>>>> My preference would be as follows:
>>>>>> * I'd like to refer to Layer1 nodes as the "data nodes" and hence get
>>>>>> either no role defined for them or -Dnode.roles=data.
>>>>>> * I'd like to refer to Layer2 nodes as "overseer nodes" (even though
>>>>>> I understand, only one of them can be an overseer at a time). I'd like to
>>>>>> have -Dnode.roles=!data,overseer
>>>>>> * I'd like to refer to Layer3 nodes as "query nodes", with
>>>>>> -Dnode.roles=!data,query
>>>>>>
>>>>>> ^ This seems very practical from operational standpoint.
>>>>>>
>>>>>> So, for the proposal, lets say "data" is a special role which is
>>>>>> assumed by default, and is enabled on all nodes unless there's a !data. 
>>>>>> It
>>>>>> is presumed that data nodes can also serve queries directly, so adding a
>>>>>> "query" to those nodes is meaningless (also because there's no practical
>>>>>> benefit to stopping a data node from receiving a query for "!query" role 
>>>>>> to
>>>>>> be useful).
>>>>>>
>>>>>> "query" role on nodes that don't host data really refers to a special
>>>>>> capability for lightweight, stateless nodes. I don't want to add a 
>>>>>> "!query"
>>>>>> on dedicated overseer nodes, and hence I don't want to assume that 
>>>>>> "query"
>>>>>> is implicitly avaiable on any node even if the role isn't specified.
>>>>>>
>>>>>> "overseer" role is complicated, since it is already defined and we
>>>>>> don't have the opportunity to define it the right way. I'd hate having to
>>>>>> put a "!overseer" on every data node on startup in order to have a few
>>>>>> dedicated overseers.
>>>>>>
>>>>>> In short, in this SIP, I just wish to implement the concept of nodes
>>>>>> and its handling. How individual roles are leveraged can be up to every 
>>>>>> new
>>>>>> role's implementation.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 27, 2021 at 9:54 PM Gus Heck <[email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> In other words, roles are all "positive", but their consequences
>>>>>>>> are only negative (rejecting when the matching positive role is not
>>>>>>>> present).
>>>>>>>>
>>>>>>>> Yeah right. to do something the machine needs the role
>>>>>>>
>>>>>>>
>>>>>>>> We can also consider no role defined = all roles allowed. Will make
>>>>>>>> things simpler.
>>>>>>>>
>>>>>>>
>>>>>>> in terms of startup command yes. Internally we should have all
>>>>>>> explicitly assigned when no roles are specified at startup so that the 
>>>>>>> code
>>>>>>> doesn't have a million if checks for the empty case
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 27, 2021 at 6:14 PM Ilan Ginzburg <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> How do we expect the roles to be used?
>>>>>>>>> One way I see is a node refusing to do anything related to a role
>>>>>>>>> it doesn't have.
>>>>>>>>> For example if a node does not have role "data", any attempt to
>>>>>>>>> create a core on it would fail.
>>>>>>>>> A node not having the role "query", will refuse to have anything
>>>>>>>>> to do with handling a query etc.
>>>>>>>>> Then it would be up to other code to make sure only the
>>>>>>>>> appropriate nodes are requested to do any type of action.
>>>>>>>>> So for example any replica placement code plugin would have to
>>>>>>>>> restrict the set of candidate nodes for a new replica placement to 
>>>>>>>>> those
>>>>>>>>> having "data". Otherwise the call would fail, and there should be 
>>>>>>>>> nothing
>>>>>>>>> the replica placement code can do about it.
>>>>>>>>>
>>>>>>>>> Similarly, the "overseer" role would limit the nodes that
>>>>>>>>> participate in the Overseer election. The Overseer election code 
>>>>>>>>> would have
>>>>>>>>> to remove (or not add) all non qualifying nodes from the election, 
>>>>>>>>> and we
>>>>>>>>> should expect a node without role "overseer" to refuse to start the
>>>>>>>>> Overseer machinery if asked to...
>>>>>>>>>
>>>>>>>>> Trying to make the use case clear regarding how roles are used.
>>>>>>>>> Ilan
>>>>>>>>>
>>>>>>>>> On Wed, Oct 27, 2021 at 5:47 PM Gus Heck <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 27, 2021 at 9:55 AM Ishan Chattopadhyaya <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Gus,
>>>>>>>>>>>
>>>>>>>>>>> > I think that we should expand/edit your list of roles to be
>>>>>>>>>>>
>>>>>>>>>>> The list can be expanded as and when more isolation and features
>>>>>>>>>>> are needed. I only listed those roles that we already have a 
>>>>>>>>>>> functionality
>>>>>>>>>>> for or is under development.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Well all of those roles (except zookeeper) are things nodes do
>>>>>>>>>> today. As it stands they are all doing all of them. What we add 
>>>>>>>>>> support for
>>>>>>>>>> as we move forward is starting without a role, and add the zookeeper 
>>>>>>>>>> role
>>>>>>>>>> when that feature is ready.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> > I would like to recommend that the roles be all positive ("Can
>>>>>>>>>>> do this") and nodes with no role at all are ineligible for all 
>>>>>>>>>>> activities.
>>>>>>>>>>>
>>>>>>>>>>> It comes down to the defaults and backcompat. If we want all
>>>>>>>>>>> Solr nodes to be able to host data replicas by default (without user
>>>>>>>>>>> explicitly specifying role=data), then we need a way to unset this 
>>>>>>>>>>> role.
>>>>>>>>>>> The most reasonable way sounded like a "!data". We can do away with 
>>>>>>>>>>> !data
>>>>>>>>>>> if we mandate each and every data node have the role "data" 
>>>>>>>>>>> explicitly
>>>>>>>>>>> defined for it, which breaks backcompat and also is cumbersome to 
>>>>>>>>>>> use for
>>>>>>>>>>> those who don't want to use these special roles.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Not sure I understand, which of the roles I mentioned (other than
>>>>>>>>>> zookeeper, which I expect is intended as different from our current
>>>>>>>>>> embedded zk) is NOT currently supported by a single cloud node 
>>>>>>>>>> brought up
>>>>>>>>>> as shown in our tutorials/docs? I'm certainly not proposing that the
>>>>>>>>>> default change to nothing. The default is all roles, unless you 
>>>>>>>>>> specify
>>>>>>>>>> roles at startup.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> > I also suggest that these roles each have a node in zookeeper
>>>>>>>>>>> listing the current member nodes (as child nodes) so that code that 
>>>>>>>>>>> wants
>>>>>>>>>>> to find a node with an appropriate role does not need to scan the 
>>>>>>>>>>> list of
>>>>>>>>>>> all nodes parsing something to discover which nodes apply and also 
>>>>>>>>>>> does not
>>>>>>>>>>> have to parse json to do it.
>>>>>>>>>>>
>>>>>>>>>>> /roles.json exists today, it has role as key and list of nodes
>>>>>>>>>>> as value. In the next major version, we can change the format of 
>>>>>>>>>>> that file
>>>>>>>>>>> and use key as node, value as list of roles. Or, maybe we can go 
>>>>>>>>>>> for adding
>>>>>>>>>>> the roles to the data for each item in the list of live_nodes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> I'm not finding anything in our documentation about roles.json so
>>>>>>>>>> I think it's an internal implementation detail, which reduces back 
>>>>>>>>>> compat
>>>>>>>>>> concerns. ADDROLE/REMOVEROLE don't accept json or anything like that 
>>>>>>>>>> and
>>>>>>>>>> could be made to work with zk nodes too.
>>>>>>>>>>
>>>>>>>>>> The fact that some precursor work was done without a SIP (or
>>>>>>>>>> before SIPs existed) should not hamstring our design once a SIP that
>>>>>>>>>> clearly covers the same topic is under consideration. By their 
>>>>>>>>>> nature SIP's
>>>>>>>>>> are non-trivial and often will include compatibility breaks. Good 
>>>>>>>>>> news is I
>>>>>>>>>> don't think I see one here, just a code change to transition to a 
>>>>>>>>>> different
>>>>>>>>>> zk backend. I think that it's probably a mistake to consider our 
>>>>>>>>>> zookeeper
>>>>>>>>>> data a public API and we should be moving away from that or at the 
>>>>>>>>>> very
>>>>>>>>>> least segregating clearly what in zk is long term reliable. Ideally 
>>>>>>>>>> our
>>>>>>>>>> v1/v2 api's should be the public api through which information about 
>>>>>>>>>> the
>>>>>>>>>> cluster is obtained. Programming directly against zk is kind of like 
>>>>>>>>>> a
>>>>>>>>>> custom build of solr. Sometimes useful and appropriate, but 
>>>>>>>>>> maintenance is
>>>>>>>>>> your concern. For code plugging into solr, it should in theory be 
>>>>>>>>>> against
>>>>>>>>>> an internal information java api, and zookeeper should not be touched
>>>>>>>>>> directly. (I know this is not in a good state or at least wasn't 
>>>>>>>>>> last time
>>>>>>>>>> I looked closely, but it should be where we are heading).
>>>>>>>>>>
>>>>>>>>>> > any code seeking to transition a node
>>>>>>>>>>>
>>>>>>>>>>> We considered this situation and realized that it is very risky
>>>>>>>>>>> to have nodes change roles while they are up and running. Better to 
>>>>>>>>>>> assign
>>>>>>>>>>> fixed roles upon startup.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I agree that concurrency is hard. I definitely think startup time
>>>>>>>>>> assignments should be involved here. I'm not thinking that every 
>>>>>>>>>> transition
>>>>>>>>>> must be supported. As a starting point it would be fine if none were.
>>>>>>>>>> Having something suddenly become zookeeper is probably tricky to 
>>>>>>>>>> support
>>>>>>>>>> (see discussion in that thread regarding nodes not actually 
>>>>>>>>>> participating
>>>>>>>>>> until they have a partner to join with them to avoid even numbered
>>>>>>>>>> clusters), but I think the design should not preclude the 
>>>>>>>>>> possibility of
>>>>>>>>>> nodes becoming eligible for some roles or withdrawing from some 
>>>>>>>>>> roles, and
>>>>>>>>>> treatment of roles should be consistent. In some cases someone may 
>>>>>>>>>> decide
>>>>>>>>>> it's worth the work of handling the concurrency concerns, best if 
>>>>>>>>>> they
>>>>>>>>>> don't have to break back compat or hack their code around the 
>>>>>>>>>> assumption it
>>>>>>>>>> wouldn't happen to do it.
>>>>>>>>>>
>>>>>>>>>> Taking the zookeeper case as an example, it very much might be
>>>>>>>>>> desirable to have the possibility to heal the zk cluster by promoting
>>>>>>>>>> another node (configured as eligible for zk) to active zk duty if 
>>>>>>>>>> one of
>>>>>>>>>> the current zk nodes has been down long enough (say on prem hardware,
>>>>>>>>>> motherboard pops a capacitor, server gone for a week while new 
>>>>>>>>>> hardware is
>>>>>>>>>> purchased, built and configured). Especially if the down node didn't 
>>>>>>>>>> hold
>>>>>>>>>> data or other nodes had sufficient replicas and the cluster is still
>>>>>>>>>> answering queries just fine.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > I know of a case that would benefit from having separate
>>>>>>>>>>> Query/Update nodes that handle a heavy analysis process which would 
>>>>>>>>>>> be
>>>>>>>>>>> deployed to a number of CPU heavy boxes (which might add more in 
>>>>>>>>>>> prep for
>>>>>>>>>>> bulk indexing, and remove them when bulk was done), data could then 
>>>>>>>>>>> be
>>>>>>>>>>> hosted on cheaper nodes....
>>>>>>>>>>>
>>>>>>>>>>> This is the main motivation behind this work. SOLR-15715 needs
>>>>>>>>>>> this, and hence it would be good to get this in as soon as possible.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think we can incrementally work towards configurability for all
>>>>>>>>>> of these roles. The current default state is that a node has all 
>>>>>>>>>> roles and
>>>>>>>>>> the incremental progress is to enable removing a role from a node. 
>>>>>>>>>> This I
>>>>>>>>>> think is why it might be good to to
>>>>>>>>>>
>>>>>>>>>> A) Determine the set of roles our current solr nodes are
>>>>>>>>>> performing (that might be removed in some scenario) and document 
>>>>>>>>>> this via
>>>>>>>>>> assigning these roles as default on as this SIP goes live.
>>>>>>>>>> B) Figure out what the process of adding something entirely new
>>>>>>>>>> that we haven't yet thought of with its own role would look like.
>>>>>>>>>>
>>>>>>>>>> I think it would be great if we not only satisfied the current
>>>>>>>>>> need but determined how we expect this to change over time.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ishan
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 27, 2021 at 6:32 PM Gus Heck <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The SIP looks like a good start, and I was already thinking of
>>>>>>>>>>>> something very similar to this as a follow on to my attempts to 
>>>>>>>>>>>> split the
>>>>>>>>>>>> uber filter (SolrDispatchFilter) into servlets such that roles 
>>>>>>>>>>>> determine
>>>>>>>>>>>> what servlets are deployed, but I would like to recommend that the 
>>>>>>>>>>>> roles be
>>>>>>>>>>>> all positive ("Can do this") and nodes with no role at all are 
>>>>>>>>>>>> ineligible
>>>>>>>>>>>> for all activities. (just like standard role permissioning 
>>>>>>>>>>>> systems). This
>>>>>>>>>>>> will make it much more familiar and easy to think about. Therefore 
>>>>>>>>>>>> there
>>>>>>>>>>>> would be no need for a role such as !data which I presume was 
>>>>>>>>>>>> meant to mean
>>>>>>>>>>>> "no data on this node"... rather just don't give the "data" role 
>>>>>>>>>>>> to the
>>>>>>>>>>>> node.
>>>>>>>>>>>>
>>>>>>>>>>>> Additional node roles I think should exist:
>>>>>>>>>>>>
>>>>>>>>>>>> I think that we should expand/edit your list of roles to be
>>>>>>>>>>>>
>>>>>>>>>>>>    - QUERY - accepts and analyzes queries up to the point of
>>>>>>>>>>>>    actually consulting the lucene index (useful if you have a very 
>>>>>>>>>>>> heavy
>>>>>>>>>>>>    analysis phase)
>>>>>>>>>>>>    - UPDATE - accepts update requests, and performs update
>>>>>>>>>>>>    functionality prior to and including 
>>>>>>>>>>>> DistributedUpdateProcessorFactory
>>>>>>>>>>>>    (useful if you have a very heavy analysis phase)
>>>>>>>>>>>>    - ADMIN - accepts admin/management commands
>>>>>>>>>>>>    - UI - hosts an admin ui
>>>>>>>>>>>>    - ZOOKEEPER - hosts embedded zookeeper
>>>>>>>>>>>>    - OVERSEER - performs overseer related functionality
>>>>>>>>>>>>    (though IIRC there's a proposal to eliminate overseer that 
>>>>>>>>>>>> might eliminate
>>>>>>>>>>>>    this)
>>>>>>>>>>>>    - DATA - nodes where there is a lucene index and matching
>>>>>>>>>>>>    against the analyzed results of a query may be conducted to 
>>>>>>>>>>>> generate a
>>>>>>>>>>>>    response, also performs update steps that come after
>>>>>>>>>>>>    DistributedUpdateProcesserFactory
>>>>>>>>>>>>
>>>>>>>>>>>> I also suggest that these roles each have a node in zookeeper
>>>>>>>>>>>> listing the current member nodes (as child nodes) so that code 
>>>>>>>>>>>> that wants
>>>>>>>>>>>> to find a node with an appropriate role does not need to scan the 
>>>>>>>>>>>> list of
>>>>>>>>>>>> all nodes parsing something to discover which nodes apply and also 
>>>>>>>>>>>> does not
>>>>>>>>>>>> have to parse json to do it. I think this will be particularly key 
>>>>>>>>>>>> for
>>>>>>>>>>>> zookeeper nodes which might be 3 out of 100 or more nodes. Similar 
>>>>>>>>>>>> to how
>>>>>>>>>>>> we track live nodes. I think we should have a nodes.json too that 
>>>>>>>>>>>> tracks
>>>>>>>>>>>> what roles a node is ALLOWED to take (as opposed to which roles it
>>>>>>>>>>>> currently servicing)
>>>>>>>>>>>>
>>>>>>>>>>>> So running code consults the zookeeper role list of nodes, and
>>>>>>>>>>>> any code seeking to transition a node (an admin operation with 
>>>>>>>>>>>> much lower
>>>>>>>>>>>> performance requirements) consults the json data in the nodes.json 
>>>>>>>>>>>> node,
>>>>>>>>>>>> parses it, finds the node in question and checks what it's 
>>>>>>>>>>>> eligible for
>>>>>>>>>>>> (this will correspond to which servlets/apps have been loaded).
>>>>>>>>>>>>
>>>>>>>>>>>> I know of a case that would benefit from having separate
>>>>>>>>>>>> Query/Update nodes that handle a heavy analysis process which 
>>>>>>>>>>>> would be
>>>>>>>>>>>> deployed to a number of CPU heavy boxes (which might add more in 
>>>>>>>>>>>> prep for
>>>>>>>>>>>> bulk indexing, and remove them when bulk was done), data could 
>>>>>>>>>>>> then be
>>>>>>>>>>>> hosted on cheaper nodes....
>>>>>>>>>>>>
>>>>>>>>>>>> Also maybe think about how this relates to NRT/TLOG/PULL which
>>>>>>>>>>>> are also maybe role like
>>>>>>>>>>>>
>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>
>>>>>>>>>>>> -Gus
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 27, 2021 at 3:17 AM Ishan Chattopadhyaya <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>>>>>>>>
>>>>>>>>>>>>> We also wish to add first class support for Query nodes that
>>>>>>>>>>>>> are used to process user queries by forwarding to data nodes,
>>>>>>>>>>>>> merging/aggregating them and presenting to users. This concept 
>>>>>>>>>>>>> exists as
>>>>>>>>>>>>> first class citizens in most other search engines. This is a 
>>>>>>>>>>>>> chance for
>>>>>>>>>>>>> Solr to catch up.
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Ishan / Noble / Hitesh
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>>>>>>> http://www.the111shift.com (play)
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>>>>> http://www.the111shift.com (play)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>> http://www.the111shift.com (play)
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

Re: First class support for node roles

Reply via email to