> > The config name was/is hbase.hregion.max.*filesize* and never * > hbase.hregion.max.size*. >
Description for hbase.hregion.max.filesize is very clear stating that it's the sum of all hfiles in the region that should not exceed this property value. And we not always use *hbase.hregion.max.filesize* to determine the limit, but a MAX_FILESIZE table level descriptor whose description reads as below, on TableDescriptorBuilder javadoc: /** * Returns the maximum size upto which a region can grow to after which a * region split is triggered. The region size is represented by the size of * the biggest store file in that region. * * @return max hregion size for table, -1 if not set. */ Current IncreasingToUpperBoundRegionSplitPolicy implementation is violating those configs. Do we have a consensus on applying #3 for all active branches? If so, I would instruct HBASE-24530 to proceed as such. Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell < [email protected]> escreveu: > ‘Filesize’ and ‘size’ are ambiguous. They are open to interpretation and I > don’t see one as more clear than the other, other than to imply something > about file level measures being the determining factor. It doesn’t convey > more semantics beyond that, ie one file trips the limit or the combined > sizes of all files trips the limit. We can fix that with clarifying > documentation. While doing so we also have an opportunity to fix something > if our consensus is the current policy is not the usual user expectation. > > So how suboptimal is it? Does a compatibility concern make sense if we > think this is just broken? Perhaps we can address all concerns by making > the change in next minor releases and then do those minor releases soon. > > > > On Jun 20, 2020, at 11:06 PM, Anoop John <[email protected]> wrote: > > > > I have a concern if we do #3 for all minor versions. That will be a > major > > split behaviour change and can affect so much for tables with many CFs. > If > > one adjusted the pre splits so as to avoid further region splits, that > calc > > might go wrong once they migrate to new minor versions with this change > > right? > > The config name was/is hbase.hregion.max.*filesize* and never * > > hbase.hregion.max.size*. We will have HFiles at CF level and so a max > > filesize is applicable at CF level. So even this config name will > create > > confusion once we change the calc to consider size at region level (Sum > of > > sizes at CFs) > > > > Anoop > > > > > >> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani <[email protected]> > wrote: > >> > >> Given that SteppingSplitPolicy is the default region split policy, > removal > >> of IncreasingToUpperBoundRegionSplitPolicy is going to make things more > >> complex for master branch if we follow #2. > >> Hence, I believe we should better go with #3 for all. > >> > >> > >>> On 2020/06/19 17:52:27, Viraj Jasani <[email protected]> wrote: > >>> Can we do a mix of #2 and #3 i.e remove > >> IncreasingToUpperBoundRegionSplitPolicy from master, and follow #3 for > >> branch-2 and all active release branches? If it breaks any compatibility > >> rules, then we can go with #3 for all. > >>> > >>> > >>> On 2020/06/19 17:33:14, Andrew Purtell <[email protected]> wrote: > >>>> I vote for #3, and it should be applied to all active code lines. > >>>> > >>>> > >>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington Chevreuil < > >>>> [email protected]> wrote: > >>>> > >>>>> While going through the changes proposed on HBASE-24530, we > >>>>> observed IncreasingToUpperBoundRegionSplitPolicy > >>>>> compares hbase.hregion.max.filesize against individual stores within > >> a > >>>>> region when deciding whether to split a region or not. For tables > >> having > >>>>> multiple families, this can lead to regions much larger than what's > >>>>> defined by hbase.hregion.max.filesize. > >>>>> > >>>>> Current proposal on HBASE-24530 is to add an extra policy that > >> actually > >>>>> compares the overall region size (combining all region stores sizes) > >>>>> against hbase.hregion.max.filesize, but I wonder if it really makes > >> sense > >>>>> to keep a policy with current IncreasingToUpperBoundRegionSplitPolicy > >>>>> behaviour. Would like to hear folks opinions if we should take any > >> of the > >>>>> below actions? > >>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy as it is and just > >> add the > >>>>> new policy proposed on HBASE-24530; > >>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy deprecated and > >> remove it > >>>>> from master branch; > >>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy to actually > >> implement the > >>>>> logic of the new policy proposed on HBASE-24530; > >>>>> > >>>>> My view is that the current IncreasingToUpperBoundRegionSplitPolicy > >>>>> behaviour is a bug, and I vote for #3. > >>>>> > >>>> > >>>> > >>>> -- > >>>> Best regards, > >>>> Andrew > >>>> > >>>> Words like orphans lost among the crosstalk, meaning torn from truth's > >>>> decrepit hands > >>>> - A23, Crosstalk > >>>> > >>> > >> >
