Re: Approach for a new Autoscaling framework

Jan Høydahl Fri, 24 Jul 2020 13:40:28 -0700

> Not clear to me what type of "alternative proposal" you're thinking of Jan


That would be the responsibility of Noble and others who have concerns to 
detail - and try convince other peers.
It’s hard for me as a spectator to know whether to agree with Noble without a 
clear picture of what the alternative API or approach would look like.
I’m often a fan of loosely typed APIs since they tend to cause less boilerplate 
code, but strong typing may indeed be a sound choice in this API.

Jan Høydahl

> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <[email protected]>:
> 
> 
> In my opinion we have to (and therefore will) ship at least a basic prod 
> ready implementation on top of the API that does simple things (not sure 
> about rack, but for example balance cores and disk size without co locating 
> replicas of same shard on same node).
> Without such an implementation, I suspect adoption will be low. Moreover, 
> it's always a lot more friendly to start coding from a working example than 
> from scratch.
> 
> Not clear to me what type of "alternative proposal" you're thinking of Jan. 
> Alternative API proposal? Alternative approach to replace Autoscaling?
> 
> Ilan
> 
> Ilan
> 
>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <[email protected]> wrote:
>> Important discussion indeed.
>> 
>> I don’t have time to dive deep into the PR or make up my mind whether there 
>> is a simpler and more future proof way of designing these APIs. But I 
>> understand that autoscaling is a complex beast and it is important we get it 
>> right.
>> 
>> One question regarding having to write code vs config. Is the plan to ship 
>> some very simple light weight default placement rules ootb that gives 80% of 
>> users what they need with simple config, or would every user need to write 
>> code to e.g. spread replicas across hosts/racks? I’d be interested in seeing 
>> an alternative proposal laid out, perhaps not in code but with a design that 
>> can be compared and discussed.
>> 
>> Jan Høydahl
>> 
>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <[email protected]>:
>>>> 
>>> 
>>> I think this is a valid thing to discuss on the dev list, since this isn't 
>>> just about code comments.
>>> It seems to me that Ilan wants to discuss the philosophy around how to 
>>> design plugins and the interfaces in Solr which the plugins will talk to.
>>> This is broad and affects much more than just the Autoscaling framework. 
>>> 
>>> As a community & product, we have so far agreed that Solr should be lighter 
>>> weight and additional features should live in plugins that are managed 
>>> separately from Solr itself.
>>> At that point we need to think about the lifetime and support of these 
>>> plugins. People love to refactor stuff in the solr core, which before 
>>> plugins wasn't a large issue.
>>> However if we are now intending for many customers to rely on plugins, then 
>>> we need to come up with standards and guarantees so that these plugins 
>>> don't:
>>> Stall people from upgrading Solr (minor or major versions)
>>> Hinder the development of Solr Core
>>> Cause us more headaches trying to keep multiple repos of plugins up to date 
>>> with recent versions of Solr
>>> 
>>> I am not completely sure where I stand right now, but this is definitely 
>>> something that we should be thinking about when migrating all of this 
>>> functionality to plugins.
>>> 
>>> - Houston
>>> 
>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <[email protected]> 
>>>> wrote:
>>>> I think we should move the discussion back to the PR because it has more 
>>>> context and inline comments are possible. Having this discussion in 4 
>>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>> 
>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <[email protected]> wrote:
>>>>> [I’m moving a discussion from the PR for SOLR-14613 to the dev list for a 
>>>>> wider audience. This is about replacing the now (in master) gone 
>>>>> Autoscaling framework with a way for clients to write their customized 
>>>>> placement code]
>>>>> 
>>>>> It took me a long time to write this mail and it's quite long, sorry.
>>>>> Please anybody interested in the future of Autoscaling (not only those I 
>>>>> cc'ed) do read it and provide feedback. Very impacting decisions have to 
>>>>> be made now.
>>>>> 
>>>>> Thanks Noble for your feedback.
>>>>> I believe it is important that we are aligned on what we build here, esp. 
>>>>> at the early defining stages (now).
>>>>> 
>>>>> Let me try to elaborate on your concerns and provide in general the 
>>>>> rationale behind the approach.
>>>>> 
>>>>> > Anyone who wishes to implement this should not require to learn a lot 
>>>>> > before even getting started
>>>>> For somebody who knows Solr (what is a Node, Collection, Shard, Replica) 
>>>>> and basic notions related to Autoscaling (getting variables representing 
>>>>> current state to make decisions), there’s not much to learn. The 
>>>>> framework uses the same concepts, often with the same names.
>>>>> 
>>>>> > I don't believe we should have a set of interfaces that duplicate 
>>>>> > existing classes just for this functionality.
>>>>> Where appropriate we can have existing classes be the implementations for 
>>>>> these interfaces and be passed to the plugins, that would be perfectly 
>>>>> ok. The proposal doesn’t include implementations at this stage, therefore 
>>>>> there’s no duplication, or not yet... (we must get the interfaces right 
>>>>> and agreed upon before implementation). If some interface methods in the 
>>>>> proposal have a different name from equivalent methods in internal 
>>>>> classes we plan to use, of course let's rename one or the other.
>>>>> 
>>>>> Existing internal abstractions are most of the time concrete classes and 
>>>>> not interfaces (Replica, Slice, DocCollection, ClusterState). Making 
>>>>> these visible to contrib code living elsewhere is making future 
>>>>> refactoring hard and contrib code will most likely end up reaching to 
>>>>> methods it shouldn’t be using. If we define a clean set of interfaces for 
>>>>> plugins, I wouldn’t hesitate to break external plugins that reach out to 
>>>>> other internal Solr classes, but will make everything possible to keep 
>>>>> the API backward compatible so existing plugins can be recompiled without 
>>>>> change.
>>>>> 
>>>>> > 24 interfaces to do this is definitely over engineering
>>>>> I don’t consider the number of classes or interfaces a metric of 
>>>>> complexity or of engineering quality. There are sample plugin 
>>>>> implementations to serve as a base for plugin writers (and for us 
>>>>> defining this framework) and I believe the process is relatively simple. 
>>>>> Trying to do the same things with existing Solr classes might prove a lot 
>>>>> harder (but might be worth the effort for comparison purposes to make 
>>>>> sure we agree on the approach? For example, getting sister replicas of a 
>>>>> given replica in the proposed API is: replica.getShard().getReplicas(). 
>>>>> Doing so with the internal classes likely involves getting the 
>>>>> DocCollection and Slice name from the Replica, then get the DocCollection 
>>>>> from the cluster state, there get the Slice based on its name and finally 
>>>>> getReplicas() from the Slice). I consider the role of this new framework 
>>>>> is to make life as easy as possible for writing placement code and the 
>>>>> like, make life easy for us to maintain it, make it easy to write a 
>>>>> simulation engine (should be at least an order of magnitude simpler than 
>>>>> the previous one), etc.
>>>>> 
>>>>> An example regarding readability and number of interfaces: rather than 
>>>>> defining an enum with runtime annotation for building its instances 
>>>>> (Variable.Type) and then very generic access methods, the proposal 
>>>>> defines a specific interface for each “variable type” (called 
>>>>> properties). Rather than concatenating strings to specify the data to 
>>>>> return from a remote node (based on snitches, see doc), the proposal is 
>>>>> explicit and strongly typed (here example to get a specific system 
>>>>> property from a node). This definitely does increase the number of 
>>>>> interfaces, but reduces IMO the effort to code to these abstractions and 
>>>>> provides a lot more compile time and IDE assistance.
>>>>> 
>>>>> Goal is to hide all the boilerplate code and machinery (and to a point - 
>>>>> complexity) in the implementations of these interfaces rather than have 
>>>>> each plugin writer deal with the same problems.
>>>>> 
>>>>> We’re moving from something that was complex and hard to read and debug 
>>>>> yet functionally extremely rich, to something simpler for us, more 
>>>>> demanding for users (write code rather than policy config if there's a 
>>>>> need for new behavior) but that should not be less "expressive" in any 
>>>>> significant way. One could even imagine reimplementing the former 
>>>>> Autoscaling config Domain Specific Language on top of these API (maybe as 
>>>>> a summer internship project :)
>>>>> 
>>>>> > This is a common mistake that we all do. When we design a feature we 
>>>>> > think that is the most important thing.
>>>>> If by "most important thing" you mean investing the best reasonable 
>>>>> effort to do things right then yes.
>>>>> If you mean trying to make a minor feature look more important and 
>>>>> inflated than it is, I disagree.
>>>>> As a personal note, replica placement is not the aspect of SolrCloud I'm 
>>>>> most interested in, but the first bottleneck we hit when pushing the 
>>>>> scale of SolrCloud. I approach this with a state of mind "let's do it 
>>>>> right and get it out of the way" to move to topics I really want to work 
>>>>> on (around distribution in SolrCloud and the role of Overseer). 
>>>>> Implementing Autoscaling in a way that simplifies future refactoring (or 
>>>>> that does not make them harder than they already are) is therefore very 
>>>>> high on my priority list, to support modest changes (Slice to Shard 
>>>>> renaming) and more ambitious ones (replacing Zookeeper, removing 
>>>>> Overseer, you name it).
>>>>> 
>>>>> Thanks for reading, again sorry for the long email, but I hope this helps 
>>>>> (at least helps the discussion),
>>>>> Ilan
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <[email protected]> wrote:
>>>>>> I don't believe we should have a set of interfaces that duplicate 
>>>>>> existing classes just for this functionality. This is a common mistake 
>>>>>> that we all do. When we design a feature we think that is the most 
>>>>>> important thing. We endup over designing and over engineering things. 
>>>>>> This feature will remain a tiny part of Solr. Anyone who wishes to 
>>>>>> implement this should not require to learn a lot before even getting 
>>>>>> started. Let's try to have a minimal set of interfaces so that people 
>>>>>> who try to implement them do not have a huge learning cure.
>>>>>> 
>>>>>> Let's try to understand the requirement
>>>>>> 
>>>>>> Solr wants a set of positions to place a few replicas
>>>>>> The implementation wants to know what is the current state of the 
>>>>>> cluster so that it can make those decisions
>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>> 
>>>>>> —
>>>>>> You are receiving this because you authored the thread.
>>>>>> Reply to this email directly, view it on GitHub, or unsubscribe.
>>>>>> 
>>>>> 
>>>>>

Re: Approach for a new Autoscaling framework

Reply via email to