> I'll introduce a change to the SIP document, unless there are objections/questions/concerns. WDYT? I've made the change to the document. Feedback on this welcome.
On Fri, Oct 29, 2021 at 2:52 PM Ishan Chattopadhyaya < [email protected]> wrote: > It seems to me, after going through this thread, that the role "query" is > misleading for what we're planning to introduce in SOLR-15715. We're > essentially introducing a functionality for performing, what we initially > called, "query aggregations". The actual queries run on data nodes anyway, > just that the first point of entry for such distributed queries will be a > separate node with this extra functionality. Towards that, I feel we should > call a node with such capability as a "coordinator" node (instead of "query > node"). It fits nicely with the mental model of ElasticSearch as well: > https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#coordinating-node > . > > Proposing that if a node has a role "coordinator", then that node is > already assumed to have no data replicas on it. If a node starts with roles > "coordinator,data" both, then the startup should fail with a message saying > a coordinator node should not host data on it. A coordinator node, though, > can have other roles like "zookeeper" or "overseer" etc. > > I'll introduce a change to the SIP document, unless there are > objections/questions/concerns. WDYT? > > > > On Fri, Oct 29, 2021 at 1:54 PM Ilan Ginzburg <[email protected]> wrote: > >> If we make collections role-aware for example (replicas of that >> collection can only be placed on nodes with a specific role, in addition to >> the other role based constraints), the set of roles should be user >> extensible and not fixed. >> >> If collections are not role aware, the constraints introduced by roles >> apply to all collections equally which might be insufficient if a user >> needs for example a heavily used collection to only be placed on more >> powerful nodes. >> >> Ilan >> >> On Thu 28 Oct 2021 at 07:59, Gus Heck <[email protected]> wrote: >> >>> >>> >>> On Wed, Oct 27, 2021 at 3:34 PM Houston Putman <[email protected]> >>> wrote: >>> >>>> I don't think it's unreasonable to want to have nodes that don't accept >>>>> queries. This is just ishan's use case. >>>> >>>> >>>> Maybe I am misunderstanding, and it does deal with your last point >>>> about inter-node communication, but Peer-sync uses queries when doing >>>> replication between replicas. If a node doesn't have queries enabled, then >>>> it's possible to break peer sync. There are other options to make sure >>>> certain replicas aren't queried (shards.preference). >>>> For the separation of update/query traffic, I understand having compute >>>> nodes that deal with pre-replica commands, such as managing distributed >>>> queries or pre-processing documents in the URP chain. But for actual >>>> non-distrib queries and final update requests, the only way to actually >>>> separate this traffic is using PULL/TLOG replicas, because otherwise (with >>>> NRT) all update requests are still going to the query nodes, just the same >>>> as non-query nodes that are hosting those replicas. (and shard leadership >>>> could go to any "data" node, since I imagine we wouldn't filter out the >>>> "query" nodes...) The shards.preference option makes it easy to send >>>> queries to only PULL replicas in this scenario. >>>> That's why I saw this more as a "compute" role rather than "query". >>>> >>> >>> Yeah for internal stuff we still need the ability to query so we might >>> need to accommodate that that, but you may not have noticed the bit where I >>> mentioned Query nodes doing the parsing/analysis of the query and then >>> sending a fully parsed (possibly serialized lucene objects) query to the >>> data node. So data nodes would only speak a single lucene level query >>> language and not parse queries or analyze text. In theory, with that, one >>> could even have something like elastic reduce a request to lucene objects >>> and then get an answer from a solr data node (for simple cases without need >>> to find shards via zookeeper etc) certainly many details and corner cases >>> to think about there. >>> >>> >>>> >>>> Definitely not what I would like. If I'm going to try to segregate data >>>>> nodes out to certain nodes, I don't want them appearing elsewhere just >>>>> cause something went down or filled up. Nor would I want updates to >>>>> suddenly start bogging down my query nodes.... >>>>> >>>> >>>> I guess it depends on what you are optimizing for. Maybe there can be >>>> an option for this. like -DnonLenientRoles=data,update or something like >>>> that. >>>> >>> >>> A possibility >>> >>> >>>> >>>> On Wed, Oct 27, 2021 at 3:03 PM Gus Heck <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Oct 27, 2021 at 2:44 PM Houston Putman < >>>>> [email protected]> wrote: >>>>> >>>>>> As for the "query" role, let's name it something better like >>>>>> "compute", since data nodes are always going to be "querying". >>>>>> >>>>> >>>>> I don't think it's unreasonable to want to have nodes that don't >>>>> accept queries. This is just ishan's use case. >>>>> >>>>> >>>>>> if no live nodes have roles=overseer (or roles=all), then we should >>>>>> just select any node to be overseer. This should be the same for compute, >>>>>> data, etc. >>>>>> >>>>> >>>>> Definitely not what I would like. If I'm going to try to segregate >>>>> data nodes out to certain nodes, I don't want them appearing elsewhere >>>>> just >>>>> cause something went down or filled up. Nor would I want updates to >>>>> suddenly start bogging down my query nodes.... >>>>> >>>>> >>>>>> >>>>>> So, for the proposal, lets say "data" is a special role which is >>>>>>> assumed by default, and is enabled on all nodes unless there's a !data. >>>>>>> >>>>>> >>>>>> Instead of this, maybe we have role groups. Such as >>>>>> admin~=overseer,zk or worker~=compute,data,updateProcessing >>>>>> >>>>> >>>>> Roll groups sounds like a next level feature to be built on top once >>>>> we figure out how roles work independently. >>>>> >>>>> >>>>>> >>>>>> As for the suggested Roles, I'm not sure ADMIN or UI really fit, >>>>>> since there is another option to disable the UI for a solr node, and >>>>>> various ADMIN commands have to be accepted across other node roles. (Data >>>>>> nodes require the Collections API, same with the overseer.) >>>>>> >>>>> >>>>> I admit I'm angling towards a world in which enabling and disabling >>>>> broad features is done in one way in one place... As for admin there might >>>>> be a distinction between commands issued between nodes and from the >>>>> outside >>>>> world... I have this other idea about inter-node communication that's even >>>>> less baked that I wont distract with here ;) >>>>> >>>>> >>>>>> - Houston >>>>>> >>>>>> On Wed, Oct 27, 2021 at 1:34 PM Ishan Chattopadhyaya < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> bq. In other words, roles are all "positive", but their consequences >>>>>>> are only negative (rejecting when the matching positive role is not >>>>>>> present). >>>>>>> >>>>>>> Essentially, yes. A node that doesn't specify any role should be >>>>>>> able to do everything. >>>>>>> >>>>>>> Let me just take a brief detour and mention our thoughts on the >>>>>>> "query" role. While all data nodes can also be used for querying, our >>>>>>> idea >>>>>>> was to create a layer of nodes that have some special mechanism to be >>>>>>> able >>>>>>> to proxy/forward queries to data nodes (lets call it "pseudo cores" or >>>>>>> "synthetic cores" or "proxy cores". Our thought was that any node that >>>>>>> has >>>>>>> "query,!data" role would enable this special mode on startup (whereby >>>>>>> requests are served by these special pseudo cores). We'll discuss about >>>>>>> this in detail in that issue. >>>>>>> >>>>>>> Back to the main subject here. >>>>>>> >>>>>>> Lets take a practical scenario: >>>>>>> * Layer1: Organization has about 100 nodes, each node has many data >>>>>>> replicas >>>>>>> * Layer2: To manage such a large cluster reliably, they keep aside >>>>>>> 4-5 dedicated overseer nodes. >>>>>>> * Layer3: Since query aggregations/coordination can potentially be >>>>>>> expensive, they keep aside 5-10 query nodes. >>>>>>> >>>>>>> My preference would be as follows: >>>>>>> * I'd like to refer to Layer1 nodes as the "data nodes" and hence >>>>>>> get either no role defined for them or -Dnode.roles=data. >>>>>>> * I'd like to refer to Layer2 nodes as "overseer nodes" (even though >>>>>>> I understand, only one of them can be an overseer at a time). I'd like >>>>>>> to >>>>>>> have -Dnode.roles=!data,overseer >>>>>>> * I'd like to refer to Layer3 nodes as "query nodes", with >>>>>>> -Dnode.roles=!data,query >>>>>>> >>>>>>> ^ This seems very practical from operational standpoint. >>>>>>> >>>>>>> So, for the proposal, lets say "data" is a special role which is >>>>>>> assumed by default, and is enabled on all nodes unless there's a !data. >>>>>>> It >>>>>>> is presumed that data nodes can also serve queries directly, so adding a >>>>>>> "query" to those nodes is meaningless (also because there's no practical >>>>>>> benefit to stopping a data node from receiving a query for "!query" >>>>>>> role to >>>>>>> be useful). >>>>>>> >>>>>>> "query" role on nodes that don't host data really refers to a >>>>>>> special capability for lightweight, stateless nodes. I don't want to >>>>>>> add a >>>>>>> "!query" on dedicated overseer nodes, and hence I don't want to assume >>>>>>> that >>>>>>> "query" is implicitly avaiable on any node even if the role isn't >>>>>>> specified. >>>>>>> >>>>>>> "overseer" role is complicated, since it is already defined and we >>>>>>> don't have the opportunity to define it the right way. I'd hate having >>>>>>> to >>>>>>> put a "!overseer" on every data node on startup in order to have a few >>>>>>> dedicated overseers. >>>>>>> >>>>>>> In short, in this SIP, I just wish to implement the concept of nodes >>>>>>> and its handling. How individual roles are leveraged can be up to every >>>>>>> new >>>>>>> role's implementation. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 27, 2021 at 9:54 PM Gus Heck <[email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> In other words, roles are all "positive", but their consequences >>>>>>>>> are only negative (rejecting when the matching positive role is not >>>>>>>>> present). >>>>>>>>> >>>>>>>>> Yeah right. to do something the machine needs the role >>>>>>>> >>>>>>>> >>>>>>>>> We can also consider no role defined = all roles allowed. Will >>>>>>>>> make things simpler. >>>>>>>>> >>>>>>>> >>>>>>>> in terms of startup command yes. Internally we should have all >>>>>>>> explicitly assigned when no roles are specified at startup so that the >>>>>>>> code >>>>>>>> doesn't have a million if checks for the empty case >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Oct 27, 2021 at 6:14 PM Ilan Ginzburg <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> How do we expect the roles to be used? >>>>>>>>>> One way I see is a node refusing to do anything related to a role >>>>>>>>>> it doesn't have. >>>>>>>>>> For example if a node does not have role "data", any attempt to >>>>>>>>>> create a core on it would fail. >>>>>>>>>> A node not having the role "query", will refuse to have anything >>>>>>>>>> to do with handling a query etc. >>>>>>>>>> Then it would be up to other code to make sure only the >>>>>>>>>> appropriate nodes are requested to do any type of action. >>>>>>>>>> So for example any replica placement code plugin would have to >>>>>>>>>> restrict the set of candidate nodes for a new replica placement to >>>>>>>>>> those >>>>>>>>>> having "data". Otherwise the call would fail, and there should be >>>>>>>>>> nothing >>>>>>>>>> the replica placement code can do about it. >>>>>>>>>> >>>>>>>>>> Similarly, the "overseer" role would limit the nodes that >>>>>>>>>> participate in the Overseer election. The Overseer election code >>>>>>>>>> would have >>>>>>>>>> to remove (or not add) all non qualifying nodes from the election, >>>>>>>>>> and we >>>>>>>>>> should expect a node without role "overseer" to refuse to start the >>>>>>>>>> Overseer machinery if asked to... >>>>>>>>>> >>>>>>>>>> Trying to make the use case clear regarding how roles are used. >>>>>>>>>> Ilan >>>>>>>>>> >>>>>>>>>> On Wed, Oct 27, 2021 at 5:47 PM Gus Heck <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 27, 2021 at 9:55 AM Ishan Chattopadhyaya < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Gus, >>>>>>>>>>>> >>>>>>>>>>>> > I think that we should expand/edit your list of roles to be >>>>>>>>>>>> >>>>>>>>>>>> The list can be expanded as and when more isolation and >>>>>>>>>>>> features are needed. I only listed those roles that we already >>>>>>>>>>>> have a >>>>>>>>>>>> functionality for or is under development. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Well all of those roles (except zookeeper) are things nodes do >>>>>>>>>>> today. As it stands they are all doing all of them. What we add >>>>>>>>>>> support for >>>>>>>>>>> as we move forward is starting without a role, and add the >>>>>>>>>>> zookeeper role >>>>>>>>>>> when that feature is ready. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> > I would like to recommend that the roles be all positive >>>>>>>>>>>> ("Can do this") and nodes with no role at all are ineligible for >>>>>>>>>>>> all >>>>>>>>>>>> activities. >>>>>>>>>>>> >>>>>>>>>>>> It comes down to the defaults and backcompat. If we want all >>>>>>>>>>>> Solr nodes to be able to host data replicas by default (without >>>>>>>>>>>> user >>>>>>>>>>>> explicitly specifying role=data), then we need a way to unset this >>>>>>>>>>>> role. >>>>>>>>>>>> The most reasonable way sounded like a "!data". We can do away >>>>>>>>>>>> with !data >>>>>>>>>>>> if we mandate each and every data node have the role "data" >>>>>>>>>>>> explicitly >>>>>>>>>>>> defined for it, which breaks backcompat and also is cumbersome to >>>>>>>>>>>> use for >>>>>>>>>>>> those who don't want to use these special roles. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Not sure I understand, which of the roles I mentioned (other >>>>>>>>>>> than zookeeper, which I expect is intended as different from our >>>>>>>>>>> current >>>>>>>>>>> embedded zk) is NOT currently supported by a single cloud node >>>>>>>>>>> brought up >>>>>>>>>>> as shown in our tutorials/docs? I'm certainly not proposing that the >>>>>>>>>>> default change to nothing. The default is all roles, unless you >>>>>>>>>>> specify >>>>>>>>>>> roles at startup. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> > I also suggest that these roles each have a node in zookeeper >>>>>>>>>>>> listing the current member nodes (as child nodes) so that code >>>>>>>>>>>> that wants >>>>>>>>>>>> to find a node with an appropriate role does not need to scan the >>>>>>>>>>>> list of >>>>>>>>>>>> all nodes parsing something to discover which nodes apply and also >>>>>>>>>>>> does not >>>>>>>>>>>> have to parse json to do it. >>>>>>>>>>>> >>>>>>>>>>>> /roles.json exists today, it has role as key and list of nodes >>>>>>>>>>>> as value. In the next major version, we can change the format of >>>>>>>>>>>> that file >>>>>>>>>>>> and use key as node, value as list of roles. Or, maybe we can go >>>>>>>>>>>> for adding >>>>>>>>>>>> the roles to the data for each item in the list of live_nodes. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> I'm not finding anything in our documentation about roles.json >>>>>>>>>>> so I think it's an internal implementation detail, which reduces >>>>>>>>>>> back >>>>>>>>>>> compat concerns. ADDROLE/REMOVEROLE don't accept json or anything >>>>>>>>>>> like that >>>>>>>>>>> and could be made to work with zk nodes too. >>>>>>>>>>> >>>>>>>>>>> The fact that some precursor work was done without a SIP (or >>>>>>>>>>> before SIPs existed) should not hamstring our design once a SIP that >>>>>>>>>>> clearly covers the same topic is under consideration. By their >>>>>>>>>>> nature SIP's >>>>>>>>>>> are non-trivial and often will include compatibility breaks. Good >>>>>>>>>>> news is I >>>>>>>>>>> don't think I see one here, just a code change to transition to a >>>>>>>>>>> different >>>>>>>>>>> zk backend. I think that it's probably a mistake to consider our >>>>>>>>>>> zookeeper >>>>>>>>>>> data a public API and we should be moving away from that or at the >>>>>>>>>>> very >>>>>>>>>>> least segregating clearly what in zk is long term reliable. Ideally >>>>>>>>>>> our >>>>>>>>>>> v1/v2 api's should be the public api through which information >>>>>>>>>>> about the >>>>>>>>>>> cluster is obtained. Programming directly against zk is kind of >>>>>>>>>>> like a >>>>>>>>>>> custom build of solr. Sometimes useful and appropriate, but >>>>>>>>>>> maintenance is >>>>>>>>>>> your concern. For code plugging into solr, it should in theory be >>>>>>>>>>> against >>>>>>>>>>> an internal information java api, and zookeeper should not be >>>>>>>>>>> touched >>>>>>>>>>> directly. (I know this is not in a good state or at least wasn't >>>>>>>>>>> last time >>>>>>>>>>> I looked closely, but it should be where we are heading). >>>>>>>>>>> >>>>>>>>>>> > any code seeking to transition a node >>>>>>>>>>>> >>>>>>>>>>>> We considered this situation and realized that it is very risky >>>>>>>>>>>> to have nodes change roles while they are up and running. Better >>>>>>>>>>>> to assign >>>>>>>>>>>> fixed roles upon startup. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I agree that concurrency is hard. I definitely think startup >>>>>>>>>>> time assignments should be involved here. I'm not thinking that >>>>>>>>>>> every >>>>>>>>>>> transition must be supported. As a starting point it would be fine >>>>>>>>>>> if none >>>>>>>>>>> were. Having something suddenly become zookeeper is probably tricky >>>>>>>>>>> to >>>>>>>>>>> support (see discussion in that thread regarding nodes not actually >>>>>>>>>>> participating until they have a partner to join with them to avoid >>>>>>>>>>> even >>>>>>>>>>> numbered clusters), but I think the design should not preclude the >>>>>>>>>>> possibility of nodes becoming eligible for some roles or >>>>>>>>>>> withdrawing from >>>>>>>>>>> some roles, and treatment of roles should be consistent. In some >>>>>>>>>>> cases >>>>>>>>>>> someone may decide it's worth the work of handling the concurrency >>>>>>>>>>> concerns, best if they don't have to break back compat or hack >>>>>>>>>>> their code >>>>>>>>>>> around the assumption it wouldn't happen to do it. >>>>>>>>>>> >>>>>>>>>>> Taking the zookeeper case as an example, it very much might be >>>>>>>>>>> desirable to have the possibility to heal the zk cluster by >>>>>>>>>>> promoting >>>>>>>>>>> another node (configured as eligible for zk) to active zk duty if >>>>>>>>>>> one of >>>>>>>>>>> the current zk nodes has been down long enough (say on prem >>>>>>>>>>> hardware, >>>>>>>>>>> motherboard pops a capacitor, server gone for a week while new >>>>>>>>>>> hardware is >>>>>>>>>>> purchased, built and configured). Especially if the down node >>>>>>>>>>> didn't hold >>>>>>>>>>> data or other nodes had sufficient replicas and the cluster is still >>>>>>>>>>> answering queries just fine. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> > I know of a case that would benefit from having separate >>>>>>>>>>>> Query/Update nodes that handle a heavy analysis process which >>>>>>>>>>>> would be >>>>>>>>>>>> deployed to a number of CPU heavy boxes (which might add more in >>>>>>>>>>>> prep for >>>>>>>>>>>> bulk indexing, and remove them when bulk was done), data could >>>>>>>>>>>> then be >>>>>>>>>>>> hosted on cheaper nodes.... >>>>>>>>>>>> >>>>>>>>>>>> This is the main motivation behind this work. SOLR-15715 needs >>>>>>>>>>>> this, and hence it would be good to get this in as soon as >>>>>>>>>>>> possible. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think we can incrementally work towards configurability for >>>>>>>>>>> all of these roles. The current default state is that a node has >>>>>>>>>>> all roles >>>>>>>>>>> and the incremental progress is to enable removing a role from a >>>>>>>>>>> node. This >>>>>>>>>>> I think is why it might be good to to >>>>>>>>>>> >>>>>>>>>>> A) Determine the set of roles our current solr nodes are >>>>>>>>>>> performing (that might be removed in some scenario) and document >>>>>>>>>>> this via >>>>>>>>>>> assigning these roles as default on as this SIP goes live. >>>>>>>>>>> B) Figure out what the process of adding something entirely new >>>>>>>>>>> that we haven't yet thought of with its own role would look like. >>>>>>>>>>> >>>>>>>>>>> I think it would be great if we not only satisfied the current >>>>>>>>>>> need but determined how we expect this to change over time. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Ishan >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Oct 27, 2021 at 6:32 PM Gus Heck <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> The SIP looks like a good start, and I was already thinking of >>>>>>>>>>>>> something very similar to this as a follow on to my attempts to >>>>>>>>>>>>> split the >>>>>>>>>>>>> uber filter (SolrDispatchFilter) into servlets such that roles >>>>>>>>>>>>> determine >>>>>>>>>>>>> what servlets are deployed, but I would like to recommend that >>>>>>>>>>>>> the roles be >>>>>>>>>>>>> all positive ("Can do this") and nodes with no role at all are >>>>>>>>>>>>> ineligible >>>>>>>>>>>>> for all activities. (just like standard role permissioning >>>>>>>>>>>>> systems). This >>>>>>>>>>>>> will make it much more familiar and easy to think about. >>>>>>>>>>>>> Therefore there >>>>>>>>>>>>> would be no need for a role such as !data which I presume was >>>>>>>>>>>>> meant to mean >>>>>>>>>>>>> "no data on this node"... rather just don't give the "data" role >>>>>>>>>>>>> to the >>>>>>>>>>>>> node. >>>>>>>>>>>>> >>>>>>>>>>>>> Additional node roles I think should exist: >>>>>>>>>>>>> >>>>>>>>>>>>> I think that we should expand/edit your list of roles to be >>>>>>>>>>>>> >>>>>>>>>>>>> - QUERY - accepts and analyzes queries up to the point of >>>>>>>>>>>>> actually consulting the lucene index (useful if you have a >>>>>>>>>>>>> very heavy >>>>>>>>>>>>> analysis phase) >>>>>>>>>>>>> - UPDATE - accepts update requests, and performs update >>>>>>>>>>>>> functionality prior to and including >>>>>>>>>>>>> DistributedUpdateProcessorFactory >>>>>>>>>>>>> (useful if you have a very heavy analysis phase) >>>>>>>>>>>>> - ADMIN - accepts admin/management commands >>>>>>>>>>>>> - UI - hosts an admin ui >>>>>>>>>>>>> - ZOOKEEPER - hosts embedded zookeeper >>>>>>>>>>>>> - OVERSEER - performs overseer related functionality >>>>>>>>>>>>> (though IIRC there's a proposal to eliminate overseer that >>>>>>>>>>>>> might eliminate >>>>>>>>>>>>> this) >>>>>>>>>>>>> - DATA - nodes where there is a lucene index and matching >>>>>>>>>>>>> against the analyzed results of a query may be conducted to >>>>>>>>>>>>> generate a >>>>>>>>>>>>> response, also performs update steps that come after >>>>>>>>>>>>> DistributedUpdateProcesserFactory >>>>>>>>>>>>> >>>>>>>>>>>>> I also suggest that these roles each have a node in zookeeper >>>>>>>>>>>>> listing the current member nodes (as child nodes) so that code >>>>>>>>>>>>> that wants >>>>>>>>>>>>> to find a node with an appropriate role does not need to scan the >>>>>>>>>>>>> list of >>>>>>>>>>>>> all nodes parsing something to discover which nodes apply and >>>>>>>>>>>>> also does not >>>>>>>>>>>>> have to parse json to do it. I think this will be particularly >>>>>>>>>>>>> key for >>>>>>>>>>>>> zookeeper nodes which might be 3 out of 100 or more nodes. >>>>>>>>>>>>> Similar to how >>>>>>>>>>>>> we track live nodes. I think we should have a nodes.json too that >>>>>>>>>>>>> tracks >>>>>>>>>>>>> what roles a node is ALLOWED to take (as opposed to which roles it >>>>>>>>>>>>> currently servicing) >>>>>>>>>>>>> >>>>>>>>>>>>> So running code consults the zookeeper role list of nodes, and >>>>>>>>>>>>> any code seeking to transition a node (an admin operation with >>>>>>>>>>>>> much lower >>>>>>>>>>>>> performance requirements) consults the json data in the >>>>>>>>>>>>> nodes.json node, >>>>>>>>>>>>> parses it, finds the node in question and checks what it's >>>>>>>>>>>>> eligible for >>>>>>>>>>>>> (this will correspond to which servlets/apps have been loaded). >>>>>>>>>>>>> >>>>>>>>>>>>> I know of a case that would benefit from having separate >>>>>>>>>>>>> Query/Update nodes that handle a heavy analysis process which >>>>>>>>>>>>> would be >>>>>>>>>>>>> deployed to a number of CPU heavy boxes (which might add more in >>>>>>>>>>>>> prep for >>>>>>>>>>>>> bulk indexing, and remove them when bulk was done), data could >>>>>>>>>>>>> then be >>>>>>>>>>>>> hosted on cheaper nodes.... >>>>>>>>>>>>> >>>>>>>>>>>>> Also maybe think about how this relates to NRT/TLOG/PULL which >>>>>>>>>>>>> are also maybe role like >>>>>>>>>>>>> >>>>>>>>>>>>> WDYT? >>>>>>>>>>>>> >>>>>>>>>>>>> -Gus >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 27, 2021 at 3:17 AM Ishan Chattopadhyaya < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here's an SIP for introducing the concept of node roles: >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15694 >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/SOLR/SIP-15+Node+roles >>>>>>>>>>>>>> >>>>>>>>>>>>>> We also wish to add first class support for Query nodes that >>>>>>>>>>>>>> are used to process user queries by forwarding to data nodes, >>>>>>>>>>>>>> merging/aggregating them and presenting to users. This concept >>>>>>>>>>>>>> exists as >>>>>>>>>>>>>> first class citizens in most other search engines. This is a >>>>>>>>>>>>>> chance for >>>>>>>>>>>>>> Solr to catch up. >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SOLR-15715 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Ishan / Noble / Hitesh >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> http://www.needhamsoftware.com (work) >>>>>>>>>>>>> http://www.the111shift.com (play) >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> http://www.needhamsoftware.com (work) >>>>>>>>>>> http://www.the111shift.com (play) >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> http://www.needhamsoftware.com (work) >>>>>>>> http://www.the111shift.com (play) >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> http://www.needhamsoftware.com (work) >>>>> http://www.the111shift.com (play) >>>>> >>>> >>> >>> -- >>> http://www.needhamsoftware.com (work) >>> http://www.the111shift.com (play) >>> >>
