Re: SIP-17: Solr Node Autoscaling on Kubernetes

Houston Putman Thu, 18 May 2023 07:20:09 -0700

A further addendum to these points I replied to Radu about earlier:


>    1. These single node commands would be less disruptive than an overall
>    REBALANCESHARDS command.
>    They target a single node (or a list of nodes) instead of a whole
>    cluster, so we would likely see less movement using these
>    individual commands versus the full REBALANCE.
>
> I'm not opposed to a REBALANCE command, but I think the logic would be
> quite complicated.
>
> If REBALANCESHARDS sounds like a good idea, I'm thinking it could be per
>> collection or for the whole cluster, I'm not sure what's best.
>>
>
> This is a good question as well. I think for the operator the best would
> be a whole cluster command, though people managing
> it themselves would probably want a per-collection option as well.
>

When starting to implement UTILIZENODE, I decided that the logic was
extremely similar to BALANCEREPLICAS. So we might as well go the extra inch
and implement the more powerful command instead of just limiting ourselves
to this single use case.

The child-ticket that tracks BalanceReplicas can be found here:
https://issues.apache.org/jira/browse/SOLR-16806

As for per-collection or whole cluster, the only options now are a list of
nodes. This node filter is optional and will default to all live data
nodes. Basically it will take the list of replicas on those nodes and
balance across them.

In the future this API could definitely take a collection, but I see this
as a further improvement than a necessity for the initial implementation.

- Houston

On Thu, Apr 20, 2023 at 4:12 PM Houston Putman <hous...@apache.org> wrote:

> Do you think we'll need a way to mark the nodes as "vacating" or something
>> so that they don't receive new replicas (i.e. for newly concurrently
>> created collections) while the REPLACENODE is being processed?
>>
>
> Great question. This would be an awesome addition. Kind of similar to the
> "shutting down" node label proposal,
> to not send requests to nodes that are about to be shut down.
>
> I'm not sure it's required for the feature to go out, but it should 100%
> be included in the proposal.
>
> - Houston
>
> On Thu, Apr 20, 2023 at 3:59 PM Tomás Fernández Löbbe <
> tomasflo...@gmail.com> wrote:
>
>> Thanks for working on this SIP Houston, I'm looking forward to use it!
>> Do you think we'll need a way to mark the nodes as "vacating" or something
>> so that they don't receive new replicas (i.e. for newly concurrently
>> created collections) while the REPLACENODE is being processed?
>>
>> On Thu, Apr 20, 2023 at 11:10 AM Houston Putman <hous...@apache.org>
>> wrote:
>>
>> > Hey, sorry for the delay y'all!
>> >
>> > *I'll first reply to Radu!*
>> >
>> > I think it tackles one of the three common use-cases that I've seen for
>> > > autoscaling
>> >
>> >
>> > I completely agree, this is just the beginning. I think the first use
>> case
>> > would be amazing, and I see
>> > it as an extension of this feature. I have some ideas on how to
>> implement
>> > it, but I think implementing
>> > the rebalancing first is a good goal. The timeseries stuff would be
>> great,
>> > and we could integrate with
>> > Solr's built in timeboxed aliasing. But I think that would probably be a
>> > third goal across the 3 different use cases.
>> > In my opinion at least.
>> >
>> > Thanks for bringing them up, I completely agree that it's important to
>> list
>> > the various options and state
>> > which ones we are choosing to move forward with.
>> >
>> > With regards to UTILIZENODE&REPLACENODE, I think they will work OK but I
>> > > wonder if a general REBALANCESHARDS command will work better? Or maybe
>> > it's
>> > > just because I'm thinking of Elasticsearch/OpenSearch. But it seems
>> like
>> > a
>> > > more "general" approach.
>> > >
>> >
>> > This is a good question. I'm not that familiar with
>> > ElasticSearch/OpenSearch, but for me there are two good reasons
>> > to go with UTILIZENODE and REPLACENODE.
>> >
>> >    1. REPLACENODE already exists in Solr, so the feature would be
>> >    compatible with older Solr versions (Just scale down, not scale up).
>> >    2. These single node commands would be less disruptive than an
>> overall
>> >    REBALANCESHARDS command.
>> >    They target a single node (or a list of nodes) instead of a whole
>> >    cluster, so we would likely see less movement using these
>> >    individual commands versus the full REBALANCE.
>> >
>> > I'm not opposed to a REBALANCE command, but I think the logic would be
>> > quite complicated.
>> >
>> > If REBALANCESHARDS sounds like a good idea, I'm thinking it could be per
>> > > collection or for the whole cluster, I'm not sure what's best.
>> > >
>> >
>> > This is a good question as well. I think for the operator the best
>> would be
>> > a whole cluster command, though people managing
>> > it themselves would probably want a per-collection option as well.
>> >
>> > *Now for Jason's questions!*
>> >
>> > Does this SIP include one or more OOTB implementations for the
>> > > UtilizeSelectionRequest interface?  What heuristics might those work
>> > > off of?
>> > >
>> >
>> > I was imagining that each of the existing PlacementPlugins would also
>> > implement this UtilizeSelectionRequest interface.
>> > So "Affinity", "Random", "Simple" and "MinimizeCores" (The last two are
>> > practically the same, not sure why we have both...)
>> >
>> >  What does "managing it themselves" mean in the "Solr Operator
>> >
>> > Interfaces" section.  (Used in reference to those who set
>> > > autoscaleReplicas.hpa.create to "false")  Is that flag just meant to
>> > > control CRUD operations on the HPA itself, or does it also govern the
>> > > UTILIZENODE/REPLACENODE calls that the operator might make on the
>> > > user's behalf?
>> >
>> >
>> > So I think it would just govern the HPA CRUD operations, if we do want
>> to
>> > have the flag at all.
>> > The UTILIZENODE/REPLACENODE calls would be governed by
>> > "autoscaleReplicas.utilizeNodesOnScaleUp" and
>> > "autoscaleReplicas.vacateNodesOnScaleDown".
>> >
>> > 3. Will the operator only drain nodes prior to pod-shutdown when the
>> > > HPA is in use?  Or might it do that even for users who aren't using an
>> > > HPA as a response to statefulset size changes, etc.  (And is that in
>> > > or out of scope for this SIP.)
>> > >
>> >
>> > So currently (regardless of HPAs) the operator drains nodes prior to
>> > pod-shutdown if ephemeral data is used.
>> > This is not done for nodes that use persistent data, because the data is
>> > persistent across pod restarts.
>> > As for statefulset size changes, that is why I made the
>> > "vacateNodesOnScaleDown" option separate from the HPA.
>> > It would drain nodes regardless of who/what made the solr cluster scale
>> > down (be it manually or via the HPA).
>> > The idea is that leaving replicas on "scaled down" nodes makes very
>> little
>> > sense, because they will be unavailable
>> > and unlike a pod restart, they are not expected to come back online
>> anytime
>> > soon.
>> >
>> > So to answer more directly, the "vacateNodesOnScaleDown" option, if
>> enabled
>> > (which I think it should be by default),
>> > would be used on any statefulset size downgrade.
>> >
>> >
>> > I'll try to update the SIP to clear up some of these questions!
>> >
>> > - Houston
>> >
>> > On Tue, Apr 11, 2023 at 7:42 AM Jason Gerlowski <gerlowsk...@gmail.com>
>> > wrote:
>> >
>> > > Hey Houston,
>> > >
>> > > Thanks for putting this together.  It's a really cool direction for
>> > > the operator.
>> > >
>> > > A few quick questions about the proposal:
>> > >
>> > > 1. Does this SIP include one or more OOTB implementations for the
>> > > UtilizeSelectionRequest interface?  What heuristics might those work
>> > > off of?
>> > >
>> > > 2. What does "managing it themselves" mean in the "Solr Operator
>> > > Interfaces" section.  (Used in reference to those who set
>> > > autoscaleReplicas.hpa.create to "false")  Is that flag just meant to
>> > > control CRUD operations on the HPA itself, or does it also govern the
>> > > UTILIZENODE/REPLACENODE calls that the operator might make on the
>> > > user's behalf?
>> > >
>> > > 3. Will the operator only drain nodes prior to pod-shutdown when the
>> > > HPA is in use?  Or might it do that even for users who aren't using an
>> > > HPA as a response to statefulset size changes, etc.  (And is that in
>> > > or out of scope for this SIP.)
>> > >
>> > > Hopefully those questions make sense and don't betray I'm out of my
>> depth
>> > > haha!
>> > >
>> > > Thanks in advance for clarifying!
>> > >
>> > > Best,
>> > >
>> > > Jason
>> > >
>> > > On Wed, Apr 5, 2023 at 9:39 AM Radu Gheorghe <
>> radu.gheor...@sematext.com
>> > >
>> > > wrote:
>> > > >
>> > > > Hi Houston,
>> > > >
>> > > > Thanks a lot for putting this together! I'd like to help with Solr
>> > > > Operator. Though I have limited availability in the following two
>> > months,
>> > > > maybe I can still be useful with a few things.
>> > > >
>> > > > Some comments regarding the SIP:
>> > > > - I think that in general it sounds like a good plan. I don't want
>> to
>> > get
>> > > > in the way instead of helping :)
>> > > > - I think it tackles one of the three common use-cases that I've
>> seen
>> > for
>> > > > autoscaling:
>> > > > 1) *AutoAddReplicas*: mostly for enterprise search, some people
>> want to
>> > > > expand on query throughput. Combining that with autoscaling sounds
>> very
>> > > > appealing.
>> > > > 2) *Rotate indices on autoscaling events*, which should work well
>> for
>> > > > time-series data. This is what we presented last year at BBuzz and
>> > > KubeCon
>> > > > for Elasticsearch/OpenSearch
>> > > > <https://sematext.com/blog/kubernetes-elasticsearch-autoscaling/>.
>> The
>> > > gzip
>> > > > -9 version of it is that you'll probably want to create a new index
>> > with
>> > > > the right number of shards after scaling out (or back in) to ensure
>> > that
>> > > > the write workload (which tends to be dominant) is evenly balanced.
>> You
>> > > may
>> > > > or may not want to rebalance previous shards, based on how often
>> you go
>> > > > back and forth.
>> > > > 3) *Rebalance existing shards as you add/remove nodes*. Which is
>> what
>> > > this
>> > > > SIP tackles, if I'm getting it right.
>> > > >
>> > > > If I understand correctly, these three don't exclude each other, so
>> I
>> > > > wouldn't bother changing this SIP to account for the other
>> use-cases,
>> > > but I
>> > > > think it's nice to have them in mind or discuss them in case anyone
>> has
>> > > any
>> > > > ideas.
>> > > >
>> > > > With regards to UTILIZENODE&REPLACENODE, I think they will work OK
>> but
>> > I
>> > > > wonder if a general REBALANCESHARDS command will work better? Or
>> maybe
>> > > it's
>> > > > just because I'm thinking of Elasticsearch/OpenSearch. But it seems
>> > like
>> > > a
>> > > > more "general" approach.
>> > > >
>> > > > If REBALANCESHARDS sounds like a good idea, I'm thinking it could be
>> > per
>> > > > collection or for the whole cluster, I'm not sure what's best. My
>> > initial
>> > > > thought is that per cluster is what we need, but on the other hand
>> per
>> > > > collection is easier to implement (just assign shards of that
>> > collection,
>> > > > and if the number of shards doesn't divide by the number of nodes,
>> just
>> > > > assign to the node with fewer replicas or maybe piggyback replica
>> > > placement
>> > > > plugins?) and it's easier to stop/recover when something goes wrong.
>> > > Plus,
>> > > > it's more opinionated in the sense that you'll want to have the
>> current
>> > > > (and future) number of nodes be a divisor of your number of shards.
>> And
>> > > > then maybe the Operator could have some config options on the steps
>> > that
>> > > > you'll want to take. For example, I know I have 12 shards in total
>> per
>> > > > collection, I want 2,3,4,6 and 12-node configurations.
>> > > >
>> > > > Please let me know if you have any thoughts/questions/reactions :)
>> > > >
>> > > > Best regards,
>> > > > Radu
>> > > > --
>> > > > Elasticsearch/OpenSearch & Solr Consulting, Production Support &
>> > Training
>> > > > Sematext Cloud - Full Stack Observability
>> > > > https://sematext.com/ <http://sematext.com/>
>> > > >
>> > > >
>> > > > On Thu, Mar 30, 2023 at 10:54 PM Houston Putman <hous...@apache.org
>> >
>> > > wrote:
>> > > >
>> > > > > Hello everyone,
>> > > > >
>> > > > > This is kind of a long-time coming, but I've finally created a SIP
>> > for
>> > > > > autoscaling Solr Nodes on Kubernetes using the Solr Operator.
>> > > > >
>> > > > >
>> > > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-17%3A+Node+Autoscaling+via+Kubernetes
>> > > > >
>> > > > > There are still some details that need to be ironed out, but
>> > hopefully
>> > > we
>> > > > > can finalize everything relatively soon and try to get this out in
>> > Solr
>> > > > > 9.3/9.4 and the Solr Operator v0.8.0.
>> > > > >
>> > > > > I've talked with quite a few people about this, so hopefully we
>> can
>> > > get a
>> > > > > good amount of turn-out to get this implemented! And if anyone is
>> > > > > interested in helping with the Solr Operator parts, I'd be very
>> happy
>> > > to
>> > > > > mentor. It's not going to be the most straightforward code, but
>> you
>> > > will
>> > > > > definitely be ramped up on contributing to the operator by the
>> end!
>> > > > >
>> > > > > Please let me know if I can answer any questions!
>> > > > >
>> > > > > - Houston
>> > > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
>> > > For additional commands, e-mail: dev-h...@solr.apache.org
>> > >
>> > >
>> >
>>
>

Re: SIP-17: Solr Node Autoscaling on Kubernetes

Reply via email to