Let's fix via approach #3. Get it done for next minor versions and then if
folks aren't sure about principle of least surprise we can talk about
wether it goes into maintenance releases.

On Tue, Jun 23, 2020, 13:07 Andrew Purtell <[email protected]> wrote:

> > Current IncreasingToUpperBoundRegionSplitPolicy implementation is
> violating those configs.
>
> Thank you for pointing this out. I feel even more strongly now this is a
> bug.
> I vote for #3.
>
> On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil <
> [email protected]> wrote:
>
> > >
> > > The config name was/is   hbase.hregion.max.*filesize* and never *
> > > hbase.hregion.max.size*.
> > >
> >
> > Description for hbase.hregion.max.filesize is very clear stating that
> it's
> > the sum of all hfiles in the region that should not exceed this property
> > value. And we not always use  *hbase.hregion.max.filesize* to determine
> the
> > limit, but a MAX_FILESIZE table level descriptor whose description reads
> as
> > below, on TableDescriptorBuilder javadoc:
> >
> >   /**
> >    * Returns the maximum size upto which a region can grow to after
> which a
> >    * region split is triggered. The region size is represented by the
> size
> > of
> >    * the biggest store file in that region.
> >    *
> >    * @return max hregion size for table, -1 if not set.
> >    */
> >
> > Current IncreasingToUpperBoundRegionSplitPolicy implementation is
> violating
> > those configs.
> >
> > Do we have a consensus on applying #3 for all active branches? If so, I
> > would instruct HBASE-24530 to proceed as such.
> >
> >
> >
> > Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell <
> > [email protected]> escreveu:
> >
> > > ‘Filesize’ and ‘size’ are ambiguous. They are open to interpretation
> and
> > I
> > > don’t see one as more clear than the other, other than to imply
> something
> > > about file level measures being the determining factor. It doesn’t
> convey
> > > more semantics beyond that, ie one file trips the limit or the combined
> > > sizes of all files trips the limit. We can fix that with clarifying
> > > documentation. While doing so we also have an opportunity to fix
> > something
> > > if our consensus is the current policy is not the usual user
> expectation.
> > >
> > > So how suboptimal is it? Does a compatibility concern make sense if we
> > > think this is just broken? Perhaps we can address all concerns by
> making
> > > the change in next minor releases and then do those minor releases
> soon.
> > >
> > >
> > > > On Jun 20, 2020, at 11:06 PM, Anoop John <[email protected]>
> > wrote:
> > > >
> > > > I have a concern if we do #3 for all minor versions.  That will be a
> > > major
> > > > split behaviour change and can affect so much for tables with many
> CFs.
> > > If
> > > > one adjusted the pre splits so as to avoid further region splits,
> that
> > > calc
> > > > might go wrong once they migrate to new minor versions with this
> change
> > > > right?
> > > > The config name was/is   hbase.hregion.max.*filesize* and never *
> > > > hbase.hregion.max.size*.  We will have HFiles at CF level and so a
> max
> > > > filesize is applicable at CF level.   So even this config name will
> > > create
> > > > confusion once we change the calc to consider size at region level
> (Sum
> > > of
> > > > sizes at CFs)
> > > >
> > > > Anoop
> > > >
> > > >
> > > >> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani <[email protected]>
> > > wrote:
> > > >>
> > > >> Given that SteppingSplitPolicy is the default region split policy,
> > > removal
> > > >> of IncreasingToUpperBoundRegionSplitPolicy is going to make things
> > more
> > > >> complex for master branch if we follow #2.
> > > >> Hence, I believe we should better go with #3 for all.
> > > >>
> > > >>
> > > >>> On 2020/06/19 17:52:27, Viraj Jasani <[email protected]> wrote:
> > > >>> Can we do a mix of #2 and #3 i.e remove
> > > >> IncreasingToUpperBoundRegionSplitPolicy from master, and follow #3
> for
> > > >> branch-2 and all active release branches? If it breaks any
> > compatibility
> > > >> rules, then we can go with #3 for all.
> > > >>>
> > > >>>
> > > >>> On 2020/06/19 17:33:14, Andrew Purtell <[email protected]>
> wrote:
> > > >>>> I vote for #3, and it should be applied to all active code lines.
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington Chevreuil <
> > > >>>> [email protected]> wrote:
> > > >>>>
> > > >>>>> While going through the changes proposed on HBASE-24530, we
> > > >>>>> observed IncreasingToUpperBoundRegionSplitPolicy
> > > >>>>> compares hbase.hregion.max.filesize against individual stores
> > within
> > > >> a
> > > >>>>> region when deciding whether to split a region or not. For tables
> > > >> having
> > > >>>>> multiple families, this can lead to regions much larger than
> what's
> > > >>>>> defined by hbase.hregion.max.filesize.
> > > >>>>>
> > > >>>>> Current proposal on HBASE-24530 is to add an extra policy that
> > > >> actually
> > > >>>>> compares the overall region size (combining all region stores
> > sizes)
> > > >>>>> against hbase.hregion.max.filesize, but I wonder if it really
> makes
> > > >> sense
> > > >>>>> to keep a policy with current
> > IncreasingToUpperBoundRegionSplitPolicy
> > > >>>>> behaviour. Would like to hear folks opinions if we should take
> any
> > > >> of the
> > > >>>>> below actions?
> > > >>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy as it is and
> just
> > > >> add the
> > > >>>>> new policy proposed on HBASE-24530;
> > > >>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy deprecated and
> > > >> remove it
> > > >>>>> from master branch;
> > > >>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy to actually
> > > >> implement the
> > > >>>>> logic of the new policy proposed on HBASE-24530;
> > > >>>>>
> > > >>>>> My view is that the current
> IncreasingToUpperBoundRegionSplitPolicy
> > > >>>>> behaviour is a bug, and I vote for #3.
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Best regards,
> > > >>>> Andrew
> > > >>>>
> > > >>>> Words like orphans lost among the crosstalk, meaning torn from
> > truth's
> > > >>>> decrepit hands
> > > >>>>   - A23, Crosstalk
> > > >>>>
> > > >>>
> > > >>
> > >
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Reply via email to