I think one of the goals of limiting the store file size is for compaction. As long as we just do compactions per family, what is the actual problem if the whole region is too big?
Wellington Chevreuil <[email protected]> 于2020年6月24日周三 下午10:56写道: > The expected behaviour for the property is well documented, so renaming and > deprecation would rather be a separate task. HBASE-24530 should concern > with making IncreasingToUpperBoundRegionSplitPolicy respect what > hbase.hregion.max.filesize and MAX_FILESIZE table level descriptor > documentation mandate, as well as being consistent with other split > policies behaviour in relation to these properties. > > Em qua., 24 de jun. de 2020 às 08:00, Anoop John <[email protected]> > escreveu: > > > If we are going to change (correct) hbase.hregion.max.filesize to > > hbase.hregion.max.size (Via proper deprecation cycle) also along with > this > > change, am good. > > > > Anoop > > > > On Wed, Jun 24, 2020 at 1:29 AM Sean Busbey <[email protected]> wrote: > > > > > Let's fix via approach #3. Get it done for next minor versions and then > > if > > > folks aren't sure about principle of least surprise we can talk about > > > wether it goes into maintenance releases. > > > > > > On Tue, Jun 23, 2020, 13:07 Andrew Purtell <[email protected]> > wrote: > > > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation is > > > > violating those configs. > > > > > > > > Thank you for pointing this out. I feel even more strongly now this > is > > a > > > > bug. > > > > I vote for #3. > > > > > > > > On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil < > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > The config name was/is hbase.hregion.max.*filesize* and never * > > > > > > hbase.hregion.max.size*. > > > > > > > > > > > > > > > > Description for hbase.hregion.max.filesize is very clear stating > that > > > > it's > > > > > the sum of all hfiles in the region that should not exceed this > > > property > > > > > value. And we not always use *hbase.hregion.max.filesize* to > > determine > > > > the > > > > > limit, but a MAX_FILESIZE table level descriptor whose description > > > reads > > > > as > > > > > below, on TableDescriptorBuilder javadoc: > > > > > > > > > > /** > > > > > * Returns the maximum size upto which a region can grow to after > > > > which a > > > > > * region split is triggered. The region size is represented by > the > > > > size > > > > > of > > > > > * the biggest store file in that region. > > > > > * > > > > > * @return max hregion size for table, -1 if not set. > > > > > */ > > > > > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation is > > > > violating > > > > > those configs. > > > > > > > > > > Do we have a consensus on applying #3 for all active branches? If > > so, I > > > > > would instruct HBASE-24530 to proceed as such. > > > > > > > > > > > > > > > > > > > > Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell < > > > > > [email protected]> escreveu: > > > > > > > > > > > ‘Filesize’ and ‘size’ are ambiguous. They are open to > > interpretation > > > > and > > > > > I > > > > > > don’t see one as more clear than the other, other than to imply > > > > something > > > > > > about file level measures being the determining factor. It > doesn’t > > > > convey > > > > > > more semantics beyond that, ie one file trips the limit or the > > > combined > > > > > > sizes of all files trips the limit. We can fix that with > clarifying > > > > > > documentation. While doing so we also have an opportunity to fix > > > > > something > > > > > > if our consensus is the current policy is not the usual user > > > > expectation. > > > > > > > > > > > > So how suboptimal is it? Does a compatibility concern make sense > if > > > we > > > > > > think this is just broken? Perhaps we can address all concerns by > > > > making > > > > > > the change in next minor releases and then do those minor > releases > > > > soon. > > > > > > > > > > > > > > > > > > > On Jun 20, 2020, at 11:06 PM, Anoop John < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > > I have a concern if we do #3 for all minor versions. That > will > > > be a > > > > > > major > > > > > > > split behaviour change and can affect so much for tables with > > many > > > > CFs. > > > > > > If > > > > > > > one adjusted the pre splits so as to avoid further region > splits, > > > > that > > > > > > calc > > > > > > > might go wrong once they migrate to new minor versions with > this > > > > change > > > > > > > right? > > > > > > > The config name was/is hbase.hregion.max.*filesize* and > never * > > > > > > > hbase.hregion.max.size*. We will have HFiles at CF level and > so > > a > > > > max > > > > > > > filesize is applicable at CF level. So even this config name > > will > > > > > > create > > > > > > > confusion once we change the calc to consider size at region > > level > > > > (Sum > > > > > > of > > > > > > > sizes at CFs) > > > > > > > > > > > > > > Anoop > > > > > > > > > > > > > > > > > > > > >> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani < > > [email protected] > > > > > > > > > > wrote: > > > > > > >> > > > > > > >> Given that SteppingSplitPolicy is the default region split > > policy, > > > > > > removal > > > > > > >> of IncreasingToUpperBoundRegionSplitPolicy is going to make > > things > > > > > more > > > > > > >> complex for master branch if we follow #2. > > > > > > >> Hence, I believe we should better go with #3 for all. > > > > > > >> > > > > > > >> > > > > > > >>> On 2020/06/19 17:52:27, Viraj Jasani <[email protected]> > > wrote: > > > > > > >>> Can we do a mix of #2 and #3 i.e remove > > > > > > >> IncreasingToUpperBoundRegionSplitPolicy from master, and > follow > > #3 > > > > for > > > > > > >> branch-2 and all active release branches? If it breaks any > > > > > compatibility > > > > > > >> rules, then we can go with #3 for all. > > > > > > >>> > > > > > > >>> > > > > > > >>> On 2020/06/19 17:33:14, Andrew Purtell <[email protected]> > > > > wrote: > > > > > > >>>> I vote for #3, and it should be applied to all active code > > > lines. > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington Chevreuil < > > > > > > >>>> [email protected]> wrote: > > > > > > >>>> > > > > > > >>>>> While going through the changes proposed on HBASE-24530, we > > > > > > >>>>> observed IncreasingToUpperBoundRegionSplitPolicy > > > > > > >>>>> compares hbase.hregion.max.filesize against individual > stores > > > > > within > > > > > > >> a > > > > > > >>>>> region when deciding whether to split a region or not. For > > > tables > > > > > > >> having > > > > > > >>>>> multiple families, this can lead to regions much larger > than > > > > what's > > > > > > >>>>> defined by hbase.hregion.max.filesize. > > > > > > >>>>> > > > > > > >>>>> Current proposal on HBASE-24530 is to add an extra policy > > that > > > > > > >> actually > > > > > > >>>>> compares the overall region size (combining all region > stores > > > > > sizes) > > > > > > >>>>> against hbase.hregion.max.filesize, but I wonder if it > really > > > > makes > > > > > > >> sense > > > > > > >>>>> to keep a policy with current > > > > > IncreasingToUpperBoundRegionSplitPolicy > > > > > > >>>>> behaviour. Would like to hear folks opinions if we should > > take > > > > any > > > > > > >> of the > > > > > > >>>>> below actions? > > > > > > >>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy as it is > and > > > > just > > > > > > >> add the > > > > > > >>>>> new policy proposed on HBASE-24530; > > > > > > >>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy deprecated > > and > > > > > > >> remove it > > > > > > >>>>> from master branch; > > > > > > >>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy to > actually > > > > > > >> implement the > > > > > > >>>>> logic of the new policy proposed on HBASE-24530; > > > > > > >>>>> > > > > > > >>>>> My view is that the current > > > > IncreasingToUpperBoundRegionSplitPolicy > > > > > > >>>>> behaviour is a bug, and I vote for #3. > > > > > > >>>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> -- > > > > > > >>>> Best regards, > > > > > > >>>> Andrew > > > > > > >>>> > > > > > > >>>> Words like orphans lost among the crosstalk, meaning torn > from > > > > > truth's > > > > > > >>>> decrepit hands > > > > > > >>>> - A23, Crosstalk > > > > > > >>>> > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Andrew > > > > > > > > Words like orphans lost among the crosstalk, meaning torn from > truth's > > > > decrepit hands > > > > - A23, Crosstalk > > > > > > > > > >
