Re: [SURVEY] The current usage of favor node balancer across the community

Mallikarjun Mon, 26 Apr 2021 19:30:58 -0700

Inline reply

On Tue, Apr 27, 2021 at 1:03 AM Stack <st...@duboce.net> wrote:


> On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:
>
> > On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <mallik.v.ar...@gmail.com>
> > wrote:
> >
> >> We use FavoredStochasticBalancer, which by description says the same
> thing
> >> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be
> >>
> >>
> >
> > Other concerns:
> >
> >  * Hard-coded triplet of nodes that will inevitably rot as machines come
> > and go (Are there tools for remediation?)
>

It doesn't really rot, if you think it with balancer responsible to
assigning regions

1. On every region assigned to a particular regionserver, the balancer
would have to reassign this triplet and hence there is no scope of rot
(Same logic applied to WAL as well). (On compaction hdfs blocks will be
pulled back if any spill over)

2. We used hostnames only (so, come and go is not going to be new nodes but
same hostnames)

Couple of outstanding problems though.

1. We couldn't increase replication factor to > 3. Which was fine so far
for our use cases. But we have had thoughts around fixing them.

2. Balancer doesn't understand favored nodes construct, perfect balanced fn
among the rsgroup datanodes isn't possible, but with some variance like
10-20% difference is expected


> >  * A workaround for a facility that belongs in the NN
>

Probably, you can argue both ways. Hbase is the owner of data and hbase has
the authority to dictate where a particular region replica sits. Benefits
like data locality will be mostly around 1, rack awareness is more aligned
to this strategy and so on.

Moreover, HDFS has data pinning for clients to make use of it. Isn't it?


> >  * Opaque in operation
>

We haven't looked around wrapping these operations around metrics, so that
it is no longer opaque and reasons mentioned in the above point.


> >  * My understanding was that the feature was never finished; in
> particular
> > the balancer wasn't properly wired- up (Happy to be incorrect here).
> >
> >
> One more concern was that the feature was dead/unused. You seem to refute
> this notion of mine.
> S
>

We have been using this for more than a year with hbase 2.1 in highly
critical workloads for our company. And several years with hbase 1.2 as
well with backporting rsgroup from master at that time. (2017-18 ish)

And it has been very smooth operationally in hbase 2.1


>
>
> >
> >
> >> Going a step back.
> >>
> >> Did we ever consider giving a thought towards truely multi-tenant hbase?
> >>
> >
> > Always.
> >
> >
> >> Where each rsgroup has a group of datanodes and namespace tables data
> >> created under that particular rsgroup would sit on those datanodes only?
> >> We
> >> have attempted to do that and have largely been very successful running
> >> clusters of hundreds of terabytes with hundreds of
> >> regionservers(datanodes)
> >> per cluster.
> >>
> >>
> > So isolation of load by node? (I believe this is where the rsgroup
> feature
> > came from originally; the desire for a deploy like you describe above.
> > IIUC, its what Thiru and crew run).
> >
> >
> >
> >> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> >> contributed by Thiruvel Thirumoolan -->
> >> https://issues.apache.org/jira/browse/HBASE-15533
> >>
> >> On each balance operation, while the region is moved around (or while
> >> creating table), favored nodes are assigned based on the rsgroup that
> >> region is pinned to. And hence data is pinned to those datanodes only
> >> (Pinning favored nodes is best effort from the hdfs side, but there are
> >> only a few exception scenarios where data will be spilled over and they
> >> recover after a major compaction).
> >>
> >>
> > Sounds like you have studied this deploy in operation. Write it up? Blog
> > post on hbase.apache.org?
> >
>

Definitely will write up.


> >
> >
> >> 2. We have introduced several balancer cost functions to restore things
> to
> >> normalcy (multi tenancy with fn pinning) such as when a node is dead, or
> >> when fn's are imbalanced within the same rsgroup, etc.
> >>
> >> 3. We had diverse workloads under the same cluster and WAL isolation
> >> became
> >> a requirement and we went ahead with similar philosophy mentioned in
> line
> >> 1. Where WAL's are created with FN pinning so that they are tied to
> >> datanodes belonging to the same rsgroup. Some discussion seems to have
> >> happened here --> https://issues.apache.org/jira/browse/HBASE-21641
> >>
> >> There are several other enhancements we have worked on with respect to
> >> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> >> replication, etc.
> >>
> >> For above use cases, we would be needing fn information on hbase:meta.
> >>
> >> If the use case seems to be a fit for how we would want hbase to be
> taken
> >> forward as one of the supported use cases, willing to contribute our
> >> changes back to the community. (I was anyway planning to initiate this
> >> discussion)
> >>
> >
> > Contribs always welcome.
>

Happy to see our thoughts are in line. We will prepare a plan on these
contributions.


> >
> > Thanks Malilkarjun,
> > S
> >
> >
> >
> >>
> >> To strengthen the above use case. Here is what one of our multi tenant
> >> cluster looks like
> >>
> >> RSGroups(Tenants): 21 (With tenant isolation)
> >> Regionservers: 275
> >> Regions Hosted: 6k
> >> Tables Hosted: 87
> >> Capacity: 250 TB (100TB used)
> >>
> >> ---
> >> Mallikarjun
> >>
> >>
> >> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <palomino...@gmail.com>
> >> wrote:
> >>
> >> > As you all know, we always want to reduce the size of the hbase-server
> >> > module. This time we want to separate the balancer related code to
> >> another
> >> > sub module.
> >> >
> >> > The design doc:
> >> >
> >> >
> >>
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> >> >
> >> > You can see the bottom of the design doc, favor node balancer is a
> >> problem,
> >> > as it stores the favor node information in hbase:meta. Stack mentioned
> >> that
> >> > the feature is already dead, maybe we could just purge it from our
> code
> >> > base.
> >> >
> >> > So here we want to know if there are still some users in the community
> >> who
> >> > still use favor node balancer. Please share your experience and
> whether
> >> you
> >> > still want to use it.
> >> >
> >> > Thanks.
> >> >
> >>
> >
>

Re: [SURVEY] The current usage of favor node balancer across the community

Reply via email to