Re: First class support for node roles

Ishan Chattopadhyaya Fri, 29 Oct 2021 02:49:20 -0700

> I'll introduce a change to the SIP document, unless there are
objections/questions/concerns. WDYT?
I've made the change to the document. Feedback on this welcome.


On Fri, Oct 29, 2021 at 2:52 PM Ishan Chattopadhyaya <
[email protected]> wrote:

> It seems to me, after going through this thread, that the role "query" is
> misleading for what we're planning to introduce in SOLR-15715. We're
> essentially introducing a functionality for performing, what we initially
> called, "query aggregations". The actual queries run on data nodes anyway,
> just that the first point of entry for such distributed queries will be a
> separate node with this extra functionality. Towards that, I feel we should
> call a node with such capability as a "coordinator" node (instead of "query
> node"). It fits nicely with the mental model of ElasticSearch as well:
> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#coordinating-node
> .
>
> Proposing that if a node has a role "coordinator", then that node is
> already assumed to have no data replicas on it. If a node starts with roles
> "coordinator,data" both, then the startup should fail with a message saying
> a coordinator node should not host data on it. A coordinator node, though,
> can have other roles like "zookeeper" or "overseer" etc.
>
> I'll introduce a change to the SIP document, unless there are
> objections/questions/concerns. WDYT?
>
>
>
> On Fri, Oct 29, 2021 at 1:54 PM Ilan Ginzburg <[email protected]> wrote:
>
>> If we make collections role-aware for example (replicas of that
>> collection can only be placed on nodes with a specific role, in addition to
>> the other role based constraints), the set of roles should be user
>> extensible and not fixed.
>>
>> If collections are not role aware, the constraints introduced by roles
>> apply to all collections equally which might be insufficient if a user
>> needs for example a heavily used collection to only be placed on more
>> powerful nodes.
>>
>> Ilan
>>
>> On Thu 28 Oct 2021 at 07:59, Gus Heck <[email protected]> wrote:
>>
>>>
>>>
>>> On Wed, Oct 27, 2021 at 3:34 PM Houston Putman <[email protected]>
>>> wrote:
>>>
>>>> I don't think it's unreasonable to want to have nodes that don't accept
>>>>> queries. This is just ishan's use case.
>>>>
>>>>
>>>> Maybe I am misunderstanding, and it does deal with your last point
>>>> about inter-node communication, but Peer-sync uses queries when doing
>>>> replication between replicas. If a node doesn't have queries enabled, then
>>>> it's possible to break peer sync. There are other options to make sure
>>>> certain replicas aren't queried (shards.preference).
>>>> For the separation of update/query traffic, I understand having compute
>>>> nodes that deal with pre-replica commands, such as managing distributed
>>>> queries or pre-processing documents in the URP chain. But for actual
>>>> non-distrib queries and final update requests, the only way to actually
>>>> separate this traffic is using PULL/TLOG replicas, because otherwise (with
>>>> NRT) all update requests are still going to the query nodes, just the same
>>>> as non-query nodes that are hosting those replicas. (and shard leadership
>>>> could go to any "data" node, since I imagine we wouldn't filter out the
>>>> "query" nodes...) The shards.preference option makes it easy to send
>>>> queries to only PULL replicas in this scenario.
>>>> That's why I saw this more as a "compute" role rather than "query".
>>>>
>>>
>>> Yeah for internal stuff we still need the ability to query so we might
>>> need to accommodate that that, but you may not have noticed the bit where I
>>> mentioned Query nodes doing the parsing/analysis of the query and then
>>> sending a fully parsed (possibly serialized lucene objects) query to the
>>> data node. So data nodes would only speak a single lucene level query
>>> language and not parse queries or analyze text. In theory, with that, one
>>> could even have something like elastic reduce a request to lucene objects
>>> and then get an answer from a solr data node (for simple cases without need
>>> to find shards via zookeeper etc) certainly many details and corner cases
>>> to think about there.
>>>
>>>
>>>>
>>>> Definitely not what I would like. If I'm going to try to segregate data
>>>>> nodes out to certain nodes, I don't want them appearing elsewhere just
>>>>> cause something went down or filled up. Nor would I want updates to
>>>>> suddenly start bogging down my query nodes....
>>>>>
>>>>
>>>> I guess it depends on what you are optimizing for. Maybe there can be
>>>> an option for this. like -DnonLenientRoles=data,update or something like
>>>> that.
>>>>
>>>
>>> A possibility
>>>
>>>
>>>>
>>>> On Wed, Oct 27, 2021 at 3:03 PM Gus Heck <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 27, 2021 at 2:44 PM Houston Putman <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> As for the "query" role, let's name it something better like
>>>>>> "compute", since data nodes are always going to be "querying".
>>>>>>
>>>>>
>>>>> I don't think it's unreasonable to want to have nodes that don't
>>>>> accept queries. This is just ishan's use case.
>>>>>
>>>>>
>>>>>>  if no live nodes have roles=overseer (or roles=all), then we should
>>>>>> just select any node to be overseer. This should be the same for compute,
>>>>>> data, etc.
>>>>>>
>>>>>
>>>>> Definitely not what I would like. If I'm going to try to segregate
>>>>> data nodes out to certain nodes, I don't want them appearing elsewhere 
>>>>> just
>>>>> cause something went down or filled up. Nor would I want updates to
>>>>> suddenly start bogging down my query nodes....
>>>>>
>>>>>
>>>>>>
>>>>>> So, for the proposal, lets say "data" is a special role which is
>>>>>>> assumed by default, and is enabled on all nodes unless there's a !data.
>>>>>>>
>>>>>>
>>>>>> Instead of  this, maybe we have role groups. Such as
>>>>>> admin~=overseer,zk or worker~=compute,data,updateProcessing
>>>>>>
>>>>>
>>>>> Roll groups sounds like a next level feature to be built on top once
>>>>> we figure out how roles work independently.
>>>>>
>>>>>
>>>>>>
>>>>>> As for the suggested Roles, I'm not sure ADMIN or UI really fit,
>>>>>> since there is another option to disable the UI for a solr node, and
>>>>>> various ADMIN commands have to be accepted across other node roles. (Data
>>>>>> nodes require the Collections API, same with the overseer.)
>>>>>>
>>>>>
>>>>> I admit I'm angling towards a world in which enabling and disabling
>>>>> broad features is done in one way in one place... As for admin there might
>>>>> be a distinction between commands issued between nodes and from the 
>>>>> outside
>>>>> world... I have this other idea about inter-node communication that's even
>>>>> less baked that I wont distract with here ;)
>>>>>
>>>>>
>>>>>> - Houston
>>>>>>
>>>>>> On Wed, Oct 27, 2021 at 1:34 PM Ishan Chattopadhyaya <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> bq. In other words, roles are all "positive", but their consequences
>>>>>>> are only negative (rejecting when the matching positive role is not
>>>>>>> present).
>>>>>>>
>>>>>>> Essentially, yes. A node that doesn't specify any role should be
>>>>>>> able to do everything.
>>>>>>>
>>>>>>> Let me just take a brief detour and mention our thoughts on the
>>>>>>> "query" role. While all data nodes can also be used for querying, our 
>>>>>>> idea
>>>>>>> was to create a layer of nodes that have some special mechanism to be 
>>>>>>> able
>>>>>>> to proxy/forward queries to data nodes (lets call it "pseudo cores" or
>>>>>>> "synthetic cores" or "proxy cores". Our thought was that any node that 
>>>>>>> has
>>>>>>> "query,!data" role would enable this special mode on startup (whereby
>>>>>>> requests are served by these special pseudo cores). We'll discuss about
>>>>>>> this in detail in that issue.
>>>>>>>
>>>>>>> Back to the main subject here.
>>>>>>>
>>>>>>> Lets take a practical scenario:
>>>>>>> * Layer1: Organization has about 100 nodes, each node has many data
>>>>>>> replicas
>>>>>>> * Layer2: To manage such a large cluster reliably, they keep aside
>>>>>>> 4-5 dedicated overseer nodes.
>>>>>>> * Layer3: Since query aggregations/coordination can potentially be
>>>>>>> expensive, they keep aside 5-10 query nodes.
>>>>>>>
>>>>>>> My preference would be as follows:
>>>>>>> * I'd like to refer to Layer1 nodes as the "data nodes" and hence
>>>>>>> get either no role defined for them or -Dnode.roles=data.
>>>>>>> * I'd like to refer to Layer2 nodes as "overseer nodes" (even though
>>>>>>> I understand, only one of them can be an overseer at a time). I'd like 
>>>>>>> to
>>>>>>> have -Dnode.roles=!data,overseer
>>>>>>> * I'd like to refer to Layer3 nodes as "query nodes", with
>>>>>>> -Dnode.roles=!data,query
>>>>>>>
>>>>>>> ^ This seems very practical from operational standpoint.
>>>>>>>
>>>>>>> So, for the proposal, lets say "data" is a special role which is
>>>>>>> assumed by default, and is enabled on all nodes unless there's a !data. 
>>>>>>> It
>>>>>>> is presumed that data nodes can also serve queries directly, so adding a
>>>>>>> "query" to those nodes is meaningless (also because there's no practical
>>>>>>> benefit to stopping a data node from receiving a query for "!query" 
>>>>>>> role to
>>>>>>> be useful).
>>>>>>>
>>>>>>> "query" role on nodes that don't host data really refers to a
>>>>>>> special capability for lightweight, stateless nodes. I don't want to 
>>>>>>> add a
>>>>>>> "!query" on dedicated overseer nodes, and hence I don't want to assume 
>>>>>>> that
>>>>>>> "query" is implicitly avaiable on any node even if the role isn't 
>>>>>>> specified.
>>>>>>>
>>>>>>> "overseer" role is complicated, since it is already defined and we
>>>>>>> don't have the opportunity to define it the right way. I'd hate having 
>>>>>>> to
>>>>>>> put a "!overseer" on every data node on startup in order to have a few
>>>>>>> dedicated overseers.
>>>>>>>
>>>>>>> In short, in this SIP, I just wish to implement the concept of nodes
>>>>>>> and its handling. How individual roles are leveraged can be up to every 
>>>>>>> new
>>>>>>> role's implementation.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 27, 2021 at 9:54 PM Gus Heck <[email protected]> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> In other words, roles are all "positive", but their consequences
>>>>>>>>> are only negative (rejecting when the matching positive role is not
>>>>>>>>> present).
>>>>>>>>>
>>>>>>>>> Yeah right. to do something the machine needs the role
>>>>>>>>
>>>>>>>>
>>>>>>>>> We can also consider no role defined = all roles allowed. Will
>>>>>>>>> make things simpler.
>>>>>>>>>
>>>>>>>>
>>>>>>>> in terms of startup command yes. Internally we should have all
>>>>>>>> explicitly assigned when no roles are specified at startup so that the 
>>>>>>>> code
>>>>>>>> doesn't have a million if checks for the empty case
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 27, 2021 at 6:14 PM Ilan Ginzburg <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> How do we expect the roles to be used?
>>>>>>>>>> One way I see is a node refusing to do anything related to a role
>>>>>>>>>> it doesn't have.
>>>>>>>>>> For example if a node does not have role "data", any attempt to
>>>>>>>>>> create a core on it would fail.
>>>>>>>>>> A node not having the role "query", will refuse to have anything
>>>>>>>>>> to do with handling a query etc.
>>>>>>>>>> Then it would be up to other code to make sure only the
>>>>>>>>>> appropriate nodes are requested to do any type of action.
>>>>>>>>>> So for example any replica placement code plugin would have to
>>>>>>>>>> restrict the set of candidate nodes for a new replica placement to 
>>>>>>>>>> those
>>>>>>>>>> having "data". Otherwise the call would fail, and there should be 
>>>>>>>>>> nothing
>>>>>>>>>> the replica placement code can do about it.
>>>>>>>>>>
>>>>>>>>>> Similarly, the "overseer" role would limit the nodes that
>>>>>>>>>> participate in the Overseer election. The Overseer election code 
>>>>>>>>>> would have
>>>>>>>>>> to remove (or not add) all non qualifying nodes from the election, 
>>>>>>>>>> and we
>>>>>>>>>> should expect a node without role "overseer" to refuse to start the
>>>>>>>>>> Overseer machinery if asked to...
>>>>>>>>>>
>>>>>>>>>> Trying to make the use case clear regarding how roles are used.
>>>>>>>>>> Ilan
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 27, 2021 at 5:47 PM Gus Heck <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 27, 2021 at 9:55 AM Ishan Chattopadhyaya <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Gus,
>>>>>>>>>>>>
>>>>>>>>>>>> > I think that we should expand/edit your list of roles to be
>>>>>>>>>>>>
>>>>>>>>>>>> The list can be expanded as and when more isolation and
>>>>>>>>>>>> features are needed. I only listed those roles that we already 
>>>>>>>>>>>> have a
>>>>>>>>>>>> functionality for or is under development.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Well all of those roles (except zookeeper) are things nodes do
>>>>>>>>>>> today. As it stands they are all doing all of them. What we add 
>>>>>>>>>>> support for
>>>>>>>>>>> as we move forward is starting without a role, and add the 
>>>>>>>>>>> zookeeper role
>>>>>>>>>>> when that feature is ready.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> > I would like to recommend that the roles be all positive
>>>>>>>>>>>> ("Can do this") and nodes with no role at all are ineligible for 
>>>>>>>>>>>> all
>>>>>>>>>>>> activities.
>>>>>>>>>>>>
>>>>>>>>>>>> It comes down to the defaults and backcompat. If we want all
>>>>>>>>>>>> Solr nodes to be able to host data replicas by default (without 
>>>>>>>>>>>> user
>>>>>>>>>>>> explicitly specifying role=data), then we need a way to unset this 
>>>>>>>>>>>> role.
>>>>>>>>>>>> The most reasonable way sounded like a "!data". We can do away 
>>>>>>>>>>>> with !data
>>>>>>>>>>>> if we mandate each and every data node have the role "data" 
>>>>>>>>>>>> explicitly
>>>>>>>>>>>> defined for it, which breaks backcompat and also is cumbersome to 
>>>>>>>>>>>> use for
>>>>>>>>>>>> those who don't want to use these special roles.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Not sure I understand, which of the roles I mentioned (other
>>>>>>>>>>> than zookeeper, which I expect is intended as different from our 
>>>>>>>>>>> current
>>>>>>>>>>> embedded zk) is NOT currently supported by a single cloud node 
>>>>>>>>>>> brought up
>>>>>>>>>>> as shown in our tutorials/docs? I'm certainly not proposing that the
>>>>>>>>>>> default change to nothing. The default is all roles, unless you 
>>>>>>>>>>> specify
>>>>>>>>>>> roles at startup.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> > I also suggest that these roles each have a node in zookeeper
>>>>>>>>>>>> listing the current member nodes (as child nodes) so that code 
>>>>>>>>>>>> that wants
>>>>>>>>>>>> to find a node with an appropriate role does not need to scan the 
>>>>>>>>>>>> list of
>>>>>>>>>>>> all nodes parsing something to discover which nodes apply and also 
>>>>>>>>>>>> does not
>>>>>>>>>>>> have to parse json to do it.
>>>>>>>>>>>>
>>>>>>>>>>>> /roles.json exists today, it has role as key and list of nodes
>>>>>>>>>>>> as value. In the next major version, we can change the format of 
>>>>>>>>>>>> that file
>>>>>>>>>>>> and use key as node, value as list of roles. Or, maybe we can go 
>>>>>>>>>>>> for adding
>>>>>>>>>>>> the roles to the data for each item in the list of live_nodes.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> I'm not finding anything in our documentation about roles.json
>>>>>>>>>>> so I think it's an internal implementation detail, which reduces 
>>>>>>>>>>> back
>>>>>>>>>>> compat concerns. ADDROLE/REMOVEROLE don't accept json or anything 
>>>>>>>>>>> like that
>>>>>>>>>>> and could be made to work with zk nodes too.
>>>>>>>>>>>
>>>>>>>>>>> The fact that some precursor work was done without a SIP (or
>>>>>>>>>>> before SIPs existed) should not hamstring our design once a SIP that
>>>>>>>>>>> clearly covers the same topic is under consideration. By their 
>>>>>>>>>>> nature SIP's
>>>>>>>>>>> are non-trivial and often will include compatibility breaks. Good 
>>>>>>>>>>> news is I
>>>>>>>>>>> don't think I see one here, just a code change to transition to a 
>>>>>>>>>>> different
>>>>>>>>>>> zk backend. I think that it's probably a mistake to consider our 
>>>>>>>>>>> zookeeper
>>>>>>>>>>> data a public API and we should be moving away from that or at the 
>>>>>>>>>>> very
>>>>>>>>>>> least segregating clearly what in zk is long term reliable. Ideally 
>>>>>>>>>>> our
>>>>>>>>>>> v1/v2 api's should be the public api through which information 
>>>>>>>>>>> about the
>>>>>>>>>>> cluster is obtained. Programming directly against zk is kind of 
>>>>>>>>>>> like a
>>>>>>>>>>> custom build of solr. Sometimes useful and appropriate, but 
>>>>>>>>>>> maintenance is
>>>>>>>>>>> your concern. For code plugging into solr, it should in theory be 
>>>>>>>>>>> against
>>>>>>>>>>> an internal information java api, and zookeeper should not be 
>>>>>>>>>>> touched
>>>>>>>>>>> directly. (I know this is not in a good state or at least wasn't 
>>>>>>>>>>> last time
>>>>>>>>>>> I looked closely, but it should be where we are heading).
>>>>>>>>>>>
>>>>>>>>>>> > any code seeking to transition a node
>>>>>>>>>>>>
>>>>>>>>>>>> We considered this situation and realized that it is very risky
>>>>>>>>>>>> to have nodes change roles while they are up and running. Better 
>>>>>>>>>>>> to assign
>>>>>>>>>>>> fixed roles upon startup.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I agree that concurrency is hard. I definitely think startup
>>>>>>>>>>> time assignments should be involved here. I'm not thinking that 
>>>>>>>>>>> every
>>>>>>>>>>> transition must be supported. As a starting point it would be fine 
>>>>>>>>>>> if none
>>>>>>>>>>> were. Having something suddenly become zookeeper is probably tricky 
>>>>>>>>>>> to
>>>>>>>>>>> support (see discussion in that thread regarding nodes not actually
>>>>>>>>>>> participating until they have a partner to join with them to avoid 
>>>>>>>>>>> even
>>>>>>>>>>> numbered clusters), but I think the design should not preclude the
>>>>>>>>>>> possibility of nodes becoming eligible for some roles or 
>>>>>>>>>>> withdrawing from
>>>>>>>>>>> some roles, and treatment of roles should be consistent. In some 
>>>>>>>>>>> cases
>>>>>>>>>>> someone may decide it's worth the work of handling the concurrency
>>>>>>>>>>> concerns, best if they don't have to break back compat or hack 
>>>>>>>>>>> their code
>>>>>>>>>>> around the assumption it wouldn't happen to do it.
>>>>>>>>>>>
>>>>>>>>>>> Taking the zookeeper case as an example, it very much might be
>>>>>>>>>>> desirable to have the possibility to heal the zk cluster by 
>>>>>>>>>>> promoting
>>>>>>>>>>> another node (configured as eligible for zk) to active zk duty if 
>>>>>>>>>>> one of
>>>>>>>>>>> the current zk nodes has been down long enough (say on prem 
>>>>>>>>>>> hardware,
>>>>>>>>>>> motherboard pops a capacitor, server gone for a week while new 
>>>>>>>>>>> hardware is
>>>>>>>>>>> purchased, built and configured). Especially if the down node 
>>>>>>>>>>> didn't hold
>>>>>>>>>>> data or other nodes had sufficient replicas and the cluster is still
>>>>>>>>>>> answering queries just fine.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> > I know of a case that would benefit from having separate
>>>>>>>>>>>> Query/Update nodes that handle a heavy analysis process which 
>>>>>>>>>>>> would be
>>>>>>>>>>>> deployed to a number of CPU heavy boxes (which might add more in 
>>>>>>>>>>>> prep for
>>>>>>>>>>>> bulk indexing, and remove them when bulk was done), data could 
>>>>>>>>>>>> then be
>>>>>>>>>>>> hosted on cheaper nodes....
>>>>>>>>>>>>
>>>>>>>>>>>> This is the main motivation behind this work. SOLR-15715 needs
>>>>>>>>>>>> this, and hence it would be good to get this in as soon as 
>>>>>>>>>>>> possible.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think we can incrementally work towards configurability for
>>>>>>>>>>> all of these roles. The current default state is that a node has 
>>>>>>>>>>> all roles
>>>>>>>>>>> and the incremental progress is to enable removing a role from a 
>>>>>>>>>>> node. This
>>>>>>>>>>> I think is why it might be good to to
>>>>>>>>>>>
>>>>>>>>>>> A) Determine the set of roles our current solr nodes are
>>>>>>>>>>> performing (that might be removed in some scenario) and document 
>>>>>>>>>>> this via
>>>>>>>>>>> assigning these roles as default on as this SIP goes live.
>>>>>>>>>>> B) Figure out what the process of adding something entirely new
>>>>>>>>>>> that we haven't yet thought of with its own role would look like.
>>>>>>>>>>>
>>>>>>>>>>> I think it would be great if we not only satisfied the current
>>>>>>>>>>> need but determined how we expect this to change over time.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Ishan
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 27, 2021 at 6:32 PM Gus Heck <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The SIP looks like a good start, and I was already thinking of
>>>>>>>>>>>>> something very similar to this as a follow on to my attempts to 
>>>>>>>>>>>>> split the
>>>>>>>>>>>>> uber filter (SolrDispatchFilter) into servlets such that roles 
>>>>>>>>>>>>> determine
>>>>>>>>>>>>> what servlets are deployed, but I would like to recommend that 
>>>>>>>>>>>>> the roles be
>>>>>>>>>>>>> all positive ("Can do this") and nodes with no role at all are 
>>>>>>>>>>>>> ineligible
>>>>>>>>>>>>> for all activities. (just like standard role permissioning 
>>>>>>>>>>>>> systems). This
>>>>>>>>>>>>> will make it much more familiar and easy to think about. 
>>>>>>>>>>>>> Therefore there
>>>>>>>>>>>>> would be no need for a role such as !data which I presume was 
>>>>>>>>>>>>> meant to mean
>>>>>>>>>>>>> "no data on this node"... rather just don't give the "data" role 
>>>>>>>>>>>>> to the
>>>>>>>>>>>>> node.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Additional node roles I think should exist:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that we should expand/edit your list of roles to be
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - QUERY - accepts and analyzes queries up to the point of
>>>>>>>>>>>>>    actually consulting the lucene index (useful if you have a 
>>>>>>>>>>>>> very heavy
>>>>>>>>>>>>>    analysis phase)
>>>>>>>>>>>>>    - UPDATE - accepts update requests, and performs update
>>>>>>>>>>>>>    functionality prior to and including 
>>>>>>>>>>>>> DistributedUpdateProcessorFactory
>>>>>>>>>>>>>    (useful if you have a very heavy analysis phase)
>>>>>>>>>>>>>    - ADMIN - accepts admin/management commands
>>>>>>>>>>>>>    - UI - hosts an admin ui
>>>>>>>>>>>>>    - ZOOKEEPER - hosts embedded zookeeper
>>>>>>>>>>>>>    - OVERSEER - performs overseer related functionality
>>>>>>>>>>>>>    (though IIRC there's a proposal to eliminate overseer that 
>>>>>>>>>>>>> might eliminate
>>>>>>>>>>>>>    this)
>>>>>>>>>>>>>    - DATA - nodes where there is a lucene index and matching
>>>>>>>>>>>>>    against the analyzed results of a query may be conducted to 
>>>>>>>>>>>>> generate a
>>>>>>>>>>>>>    response, also performs update steps that come after
>>>>>>>>>>>>>    DistributedUpdateProcesserFactory
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also suggest that these roles each have a node in zookeeper
>>>>>>>>>>>>> listing the current member nodes (as child nodes) so that code 
>>>>>>>>>>>>> that wants
>>>>>>>>>>>>> to find a node with an appropriate role does not need to scan the 
>>>>>>>>>>>>> list of
>>>>>>>>>>>>> all nodes parsing something to discover which nodes apply and 
>>>>>>>>>>>>> also does not
>>>>>>>>>>>>> have to parse json to do it. I think this will be particularly 
>>>>>>>>>>>>> key for
>>>>>>>>>>>>> zookeeper nodes which might be 3 out of 100 or more nodes. 
>>>>>>>>>>>>> Similar to how
>>>>>>>>>>>>> we track live nodes. I think we should have a nodes.json too that 
>>>>>>>>>>>>> tracks
>>>>>>>>>>>>> what roles a node is ALLOWED to take (as opposed to which roles it
>>>>>>>>>>>>> currently servicing)
>>>>>>>>>>>>>
>>>>>>>>>>>>> So running code consults the zookeeper role list of nodes, and
>>>>>>>>>>>>> any code seeking to transition a node (an admin operation with 
>>>>>>>>>>>>> much lower
>>>>>>>>>>>>> performance requirements) consults the json data in the 
>>>>>>>>>>>>> nodes.json node,
>>>>>>>>>>>>> parses it, finds the node in question and checks what it's 
>>>>>>>>>>>>> eligible for
>>>>>>>>>>>>> (this will correspond to which servlets/apps have been loaded).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I know of a case that would benefit from having separate
>>>>>>>>>>>>> Query/Update nodes that handle a heavy analysis process which 
>>>>>>>>>>>>> would be
>>>>>>>>>>>>> deployed to a number of CPU heavy boxes (which might add more in 
>>>>>>>>>>>>> prep for
>>>>>>>>>>>>> bulk indexing, and remove them when bulk was done), data could 
>>>>>>>>>>>>> then be
>>>>>>>>>>>>> hosted on cheaper nodes....
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also maybe think about how this relates to NRT/TLOG/PULL which
>>>>>>>>>>>>> are also maybe role like
>>>>>>>>>>>>>
>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Gus
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 27, 2021 at 3:17 AM Ishan Chattopadhyaya <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's an SIP for introducing the concept of node roles:
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We also wish to add first class support for Query nodes that
>>>>>>>>>>>>>> are used to process user queries by forwarding to data nodes,
>>>>>>>>>>>>>> merging/aggregating them and presenting to users. This concept 
>>>>>>>>>>>>>> exists as
>>>>>>>>>>>>>> first class citizens in most other search engines. This is a 
>>>>>>>>>>>>>> chance for
>>>>>>>>>>>>>> Solr to catch up.
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Ishan / Noble / Hitesh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>>>>>>>> http://www.the111shift.com (play)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>>>>>> http://www.the111shift.com (play)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>>> http://www.the111shift.com (play)
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>>>
>>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>

Re: First class support for node roles

Reply via email to