It's supposed to be controlling how big the region is? On Wed, Jun 24, 2020 at 8:42 AM 张铎(Duo Zhang) <[email protected]> wrote:
> I think one of the goals of limiting the store file size is for compaction. > As long as we just do compactions per family, what is the actual problem if > the whole region is too big? > > Wellington Chevreuil <[email protected]> 于2020年6月24日周三 > 下午10:56写道: > > > The expected behaviour for the property is well documented, so renaming > and > > deprecation would rather be a separate task. HBASE-24530 should concern > > with making IncreasingToUpperBoundRegionSplitPolicy respect what > > hbase.hregion.max.filesize and MAX_FILESIZE table level descriptor > > documentation mandate, as well as being consistent with other split > > policies behaviour in relation to these properties. > > > > Em qua., 24 de jun. de 2020 às 08:00, Anoop John <[email protected]> > > escreveu: > > > > > If we are going to change (correct) hbase.hregion.max.filesize to > > > hbase.hregion.max.size (Via proper deprecation cycle) also along with > > this > > > change, am good. > > > > > > Anoop > > > > > > On Wed, Jun 24, 2020 at 1:29 AM Sean Busbey <[email protected]> wrote: > > > > > > > Let's fix via approach #3. Get it done for next minor versions and > then > > > if > > > > folks aren't sure about principle of least surprise we can talk about > > > > wether it goes into maintenance releases. > > > > > > > > On Tue, Jun 23, 2020, 13:07 Andrew Purtell <[email protected]> > > wrote: > > > > > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation is > > > > > violating those configs. > > > > > > > > > > Thank you for pointing this out. I feel even more strongly now this > > is > > > a > > > > > bug. > > > > > I vote for #3. > > > > > > > > > > On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil < > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > The config name was/is hbase.hregion.max.*filesize* and > never * > > > > > > > hbase.hregion.max.size*. > > > > > > > > > > > > > > > > > > > Description for hbase.hregion.max.filesize is very clear stating > > that > > > > > it's > > > > > > the sum of all hfiles in the region that should not exceed this > > > > property > > > > > > value. And we not always use *hbase.hregion.max.filesize* to > > > determine > > > > > the > > > > > > limit, but a MAX_FILESIZE table level descriptor whose > description > > > > reads > > > > > as > > > > > > below, on TableDescriptorBuilder javadoc: > > > > > > > > > > > > /** > > > > > > * Returns the maximum size upto which a region can grow to > after > > > > > which a > > > > > > * region split is triggered. The region size is represented by > > the > > > > > size > > > > > > of > > > > > > * the biggest store file in that region. > > > > > > * > > > > > > * @return max hregion size for table, -1 if not set. > > > > > > */ > > > > > > > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation is > > > > > violating > > > > > > those configs. > > > > > > > > > > > > Do we have a consensus on applying #3 for all active branches? If > > > so, I > > > > > > would instruct HBASE-24530 to proceed as such. > > > > > > > > > > > > > > > > > > > > > > > > Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell < > > > > > > [email protected]> escreveu: > > > > > > > > > > > > > ‘Filesize’ and ‘size’ are ambiguous. They are open to > > > interpretation > > > > > and > > > > > > I > > > > > > > don’t see one as more clear than the other, other than to imply > > > > > something > > > > > > > about file level measures being the determining factor. It > > doesn’t > > > > > convey > > > > > > > more semantics beyond that, ie one file trips the limit or the > > > > combined > > > > > > > sizes of all files trips the limit. We can fix that with > > clarifying > > > > > > > documentation. While doing so we also have an opportunity to > fix > > > > > > something > > > > > > > if our consensus is the current policy is not the usual user > > > > > expectation. > > > > > > > > > > > > > > So how suboptimal is it? Does a compatibility concern make > sense > > if > > > > we > > > > > > > think this is just broken? Perhaps we can address all concerns > by > > > > > making > > > > > > > the change in next minor releases and then do those minor > > releases > > > > > soon. > > > > > > > > > > > > > > > > > > > > > > On Jun 20, 2020, at 11:06 PM, Anoop John < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > I have a concern if we do #3 for all minor versions. That > > will > > > > be a > > > > > > > major > > > > > > > > split behaviour change and can affect so much for tables with > > > many > > > > > CFs. > > > > > > > If > > > > > > > > one adjusted the pre splits so as to avoid further region > > splits, > > > > > that > > > > > > > calc > > > > > > > > might go wrong once they migrate to new minor versions with > > this > > > > > change > > > > > > > > right? > > > > > > > > The config name was/is hbase.hregion.max.*filesize* and > > never * > > > > > > > > hbase.hregion.max.size*. We will have HFiles at CF level and > > so > > > a > > > > > max > > > > > > > > filesize is applicable at CF level. So even this config > name > > > will > > > > > > > create > > > > > > > > confusion once we change the calc to consider size at region > > > level > > > > > (Sum > > > > > > > of > > > > > > > > sizes at CFs) > > > > > > > > > > > > > > > > Anoop > > > > > > > > > > > > > > > > > > > > > > > >> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > >> > > > > > > > >> Given that SteppingSplitPolicy is the default region split > > > policy, > > > > > > > removal > > > > > > > >> of IncreasingToUpperBoundRegionSplitPolicy is going to make > > > things > > > > > > more > > > > > > > >> complex for master branch if we follow #2. > > > > > > > >> Hence, I believe we should better go with #3 for all. > > > > > > > >> > > > > > > > >> > > > > > > > >>> On 2020/06/19 17:52:27, Viraj Jasani <[email protected]> > > > wrote: > > > > > > > >>> Can we do a mix of #2 and #3 i.e remove > > > > > > > >> IncreasingToUpperBoundRegionSplitPolicy from master, and > > follow > > > #3 > > > > > for > > > > > > > >> branch-2 and all active release branches? If it breaks any > > > > > > compatibility > > > > > > > >> rules, then we can go with #3 for all. > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> On 2020/06/19 17:33:14, Andrew Purtell < > [email protected]> > > > > > wrote: > > > > > > > >>>> I vote for #3, and it should be applied to all active code > > > > lines. > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington Chevreuil < > > > > > > > >>>> [email protected]> wrote: > > > > > > > >>>> > > > > > > > >>>>> While going through the changes proposed on HBASE-24530, > we > > > > > > > >>>>> observed IncreasingToUpperBoundRegionSplitPolicy > > > > > > > >>>>> compares hbase.hregion.max.filesize against individual > > stores > > > > > > within > > > > > > > >> a > > > > > > > >>>>> region when deciding whether to split a region or not. > For > > > > tables > > > > > > > >> having > > > > > > > >>>>> multiple families, this can lead to regions much larger > > than > > > > > what's > > > > > > > >>>>> defined by hbase.hregion.max.filesize. > > > > > > > >>>>> > > > > > > > >>>>> Current proposal on HBASE-24530 is to add an extra policy > > > that > > > > > > > >> actually > > > > > > > >>>>> compares the overall region size (combining all region > > stores > > > > > > sizes) > > > > > > > >>>>> against hbase.hregion.max.filesize, but I wonder if it > > really > > > > > makes > > > > > > > >> sense > > > > > > > >>>>> to keep a policy with current > > > > > > IncreasingToUpperBoundRegionSplitPolicy > > > > > > > >>>>> behaviour. Would like to hear folks opinions if we should > > > take > > > > > any > > > > > > > >> of the > > > > > > > >>>>> below actions? > > > > > > > >>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy as it is > > and > > > > > just > > > > > > > >> add the > > > > > > > >>>>> new policy proposed on HBASE-24530; > > > > > > > >>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy > deprecated > > > and > > > > > > > >> remove it > > > > > > > >>>>> from master branch; > > > > > > > >>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy to > > actually > > > > > > > >> implement the > > > > > > > >>>>> logic of the new policy proposed on HBASE-24530; > > > > > > > >>>>> > > > > > > > >>>>> My view is that the current > > > > > IncreasingToUpperBoundRegionSplitPolicy > > > > > > > >>>>> behaviour is a bug, and I vote for #3. > > > > > > > >>>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> -- > > > > > > > >>>> Best regards, > > > > > > > >>>> Andrew > > > > > > > >>>> > > > > > > > >>>> Words like orphans lost among the crosstalk, meaning torn > > from > > > > > > truth's > > > > > > > >>>> decrepit hands > > > > > > > >>>> - A23, Crosstalk > > > > > > > >>>> > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Andrew > > > > > > > > > > Words like orphans lost among the crosstalk, meaning torn from > > truth's > > > > > decrepit hands > > > > > - A23, Crosstalk > > > > > > > > > > > > > > > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
