It's supposed to be controlling how big the region is?

On Wed, Jun 24, 2020 at 8:42 AM 张铎(Duo Zhang) <[email protected]> wrote:

> I think one of the goals of limiting the store file size is for compaction.
> As long as we just do compactions per family, what is the actual problem if
> the whole region is too big?
>
> Wellington Chevreuil <[email protected]> 于2020年6月24日周三
> 下午10:56写道:
>
> > The expected behaviour for the property is well documented, so renaming
> and
> > deprecation would rather be a separate task. HBASE-24530 should concern
> > with making IncreasingToUpperBoundRegionSplitPolicy respect what
> > hbase.hregion.max.filesize and MAX_FILESIZE table level descriptor
> > documentation mandate, as well as being consistent with other split
> > policies behaviour in relation to these properties.
> >
> > Em qua., 24 de jun. de 2020 às 08:00, Anoop John <[email protected]>
> > escreveu:
> >
> > > If we are going to change (correct)   hbase.hregion.max.filesize to
> > > hbase.hregion.max.size  (Via proper deprecation cycle) also along with
> > this
> > > change, am good.
> > >
> > > Anoop
> > >
> > > On Wed, Jun 24, 2020 at 1:29 AM Sean Busbey <[email protected]> wrote:
> > >
> > > > Let's fix via approach #3. Get it done for next minor versions and
> then
> > > if
> > > > folks aren't sure about principle of least surprise we can talk about
> > > > wether it goes into maintenance releases.
> > > >
> > > > On Tue, Jun 23, 2020, 13:07 Andrew Purtell <[email protected]>
> > wrote:
> > > >
> > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation is
> > > > > violating those configs.
> > > > >
> > > > > Thank you for pointing this out. I feel even more strongly now this
> > is
> > > a
> > > > > bug.
> > > > > I vote for #3.
> > > > >
> > > > > On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > >
> > > > > > > The config name was/is   hbase.hregion.max.*filesize* and
> never *
> > > > > > > hbase.hregion.max.size*.
> > > > > > >
> > > > > >
> > > > > > Description for hbase.hregion.max.filesize is very clear stating
> > that
> > > > > it's
> > > > > > the sum of all hfiles in the region that should not exceed this
> > > > property
> > > > > > value. And we not always use  *hbase.hregion.max.filesize* to
> > > determine
> > > > > the
> > > > > > limit, but a MAX_FILESIZE table level descriptor whose
> description
> > > > reads
> > > > > as
> > > > > > below, on TableDescriptorBuilder javadoc:
> > > > > >
> > > > > >   /**
> > > > > >    * Returns the maximum size upto which a region can grow to
> after
> > > > > which a
> > > > > >    * region split is triggered. The region size is represented by
> > the
> > > > > size
> > > > > > of
> > > > > >    * the biggest store file in that region.
> > > > > >    *
> > > > > >    * @return max hregion size for table, -1 if not set.
> > > > > >    */
> > > > > >
> > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation is
> > > > > violating
> > > > > > those configs.
> > > > > >
> > > > > > Do we have a consensus on applying #3 for all active branches? If
> > > so, I
> > > > > > would instruct HBASE-24530 to proceed as such.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell <
> > > > > > [email protected]> escreveu:
> > > > > >
> > > > > > > ‘Filesize’ and ‘size’ are ambiguous. They are open to
> > > interpretation
> > > > > and
> > > > > > I
> > > > > > > don’t see one as more clear than the other, other than to imply
> > > > > something
> > > > > > > about file level measures being the determining factor. It
> > doesn’t
> > > > > convey
> > > > > > > more semantics beyond that, ie one file trips the limit or the
> > > > combined
> > > > > > > sizes of all files trips the limit. We can fix that with
> > clarifying
> > > > > > > documentation. While doing so we also have an opportunity to
> fix
> > > > > > something
> > > > > > > if our consensus is the current policy is not the usual user
> > > > > expectation.
> > > > > > >
> > > > > > > So how suboptimal is it? Does a compatibility concern make
> sense
> > if
> > > > we
> > > > > > > think this is just broken? Perhaps we can address all concerns
> by
> > > > > making
> > > > > > > the change in next minor releases and then do those minor
> > releases
> > > > > soon.
> > > > > > >
> > > > > > >
> > > > > > > > On Jun 20, 2020, at 11:06 PM, Anoop John <
> > [email protected]>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > I have a concern if we do #3 for all minor versions.  That
> > will
> > > > be a
> > > > > > > major
> > > > > > > > split behaviour change and can affect so much for tables with
> > > many
> > > > > CFs.
> > > > > > > If
> > > > > > > > one adjusted the pre splits so as to avoid further region
> > splits,
> > > > > that
> > > > > > > calc
> > > > > > > > might go wrong once they migrate to new minor versions with
> > this
> > > > > change
> > > > > > > > right?
> > > > > > > > The config name was/is   hbase.hregion.max.*filesize* and
> > never *
> > > > > > > > hbase.hregion.max.size*.  We will have HFiles at CF level and
> > so
> > > a
> > > > > max
> > > > > > > > filesize is applicable at CF level.   So even this config
> name
> > > will
> > > > > > > create
> > > > > > > > confusion once we change the calc to consider size at region
> > > level
> > > > > (Sum
> > > > > > > of
> > > > > > > > sizes at CFs)
> > > > > > > >
> > > > > > > > Anoop
> > > > > > > >
> > > > > > > >
> > > > > > > >> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> Given that SteppingSplitPolicy is the default region split
> > > policy,
> > > > > > > removal
> > > > > > > >> of IncreasingToUpperBoundRegionSplitPolicy is going to make
> > > things
> > > > > > more
> > > > > > > >> complex for master branch if we follow #2.
> > > > > > > >> Hence, I believe we should better go with #3 for all.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> On 2020/06/19 17:52:27, Viraj Jasani <[email protected]>
> > > wrote:
> > > > > > > >>> Can we do a mix of #2 and #3 i.e remove
> > > > > > > >> IncreasingToUpperBoundRegionSplitPolicy from master, and
> > follow
> > > #3
> > > > > for
> > > > > > > >> branch-2 and all active release branches? If it breaks any
> > > > > > compatibility
> > > > > > > >> rules, then we can go with #3 for all.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> On 2020/06/19 17:33:14, Andrew Purtell <
> [email protected]>
> > > > > wrote:
> > > > > > > >>>> I vote for #3, and it should be applied to all active code
> > > > lines.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington Chevreuil <
> > > > > > > >>>> [email protected]> wrote:
> > > > > > > >>>>
> > > > > > > >>>>> While going through the changes proposed on HBASE-24530,
> we
> > > > > > > >>>>> observed IncreasingToUpperBoundRegionSplitPolicy
> > > > > > > >>>>> compares hbase.hregion.max.filesize against individual
> > stores
> > > > > > within
> > > > > > > >> a
> > > > > > > >>>>> region when deciding whether to split a region or not.
> For
> > > > tables
> > > > > > > >> having
> > > > > > > >>>>> multiple families, this can lead to regions much larger
> > than
> > > > > what's
> > > > > > > >>>>> defined by hbase.hregion.max.filesize.
> > > > > > > >>>>>
> > > > > > > >>>>> Current proposal on HBASE-24530 is to add an extra policy
> > > that
> > > > > > > >> actually
> > > > > > > >>>>> compares the overall region size (combining all region
> > stores
> > > > > > sizes)
> > > > > > > >>>>> against hbase.hregion.max.filesize, but I wonder if it
> > really
> > > > > makes
> > > > > > > >> sense
> > > > > > > >>>>> to keep a policy with current
> > > > > > IncreasingToUpperBoundRegionSplitPolicy
> > > > > > > >>>>> behaviour. Would like to hear folks opinions if we should
> > > take
> > > > > any
> > > > > > > >> of the
> > > > > > > >>>>> below actions?
> > > > > > > >>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy as it is
> > and
> > > > > just
> > > > > > > >> add the
> > > > > > > >>>>> new policy proposed on HBASE-24530;
> > > > > > > >>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy
> deprecated
> > > and
> > > > > > > >> remove it
> > > > > > > >>>>> from master branch;
> > > > > > > >>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy to
> > actually
> > > > > > > >> implement the
> > > > > > > >>>>> logic of the new policy proposed on HBASE-24530;
> > > > > > > >>>>>
> > > > > > > >>>>> My view is that the current
> > > > > IncreasingToUpperBoundRegionSplitPolicy
> > > > > > > >>>>> behaviour is a bug, and I vote for #3.
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> --
> > > > > > > >>>> Best regards,
> > > > > > > >>>> Andrew
> > > > > > > >>>>
> > > > > > > >>>> Words like orphans lost among the crosstalk, meaning torn
> > from
> > > > > > truth's
> > > > > > > >>>> decrepit hands
> > > > > > > >>>>   - A23, Crosstalk
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrew
> > > > >
> > > > > Words like orphans lost among the crosstalk, meaning torn from
> > truth's
> > > > > decrepit hands
> > > > >    - A23, Crosstalk
> > > > >
> > > >
> > >
> >
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Reply via email to