Re: RegionSize returning in MB - change to bytes?

Nick Dimiduk Tue, 12 Oct 2021 16:46:44 -0700

Hi Norbert,

To answer your question directly: the RegionSizeCalculator class is
annotated with @InterfaceAudience.Private, which means there's a good
chance that it's implementation can be changed without need for a
deprecation cycle and user participation.

Curiously, I noticed that this `sizeMap` is accessed down in the method
`long getRegionSize(byte[])`, and its javadoc mentions the returned unit
explicitly as bytes.

So with a little investigation using git blame, I see that the switch from
returning values in bytes to values in megabytes came in through
HBASE-16169 -- your proposed change was the old implementation. For
whatever reasons, it was determined to not be scalable. So, we could revert
back, but we'd need some new solution to what HBASE-16169 aimed to solve.

I hope this helps.

Thanks,
Nick

On Tue, Oct 12, 2021 at 10:54 AM Norbert Kalmar <[email protected]> wrote:

> Hi All,
>
> There is a new optimization in spark (SPARK-34809) where ignoreEmptySplits
> filters out all regions that's size is 0. They use a hadoop library
> getSize() in TableInputFormat.
>
> Drilling down, this will return Bytes, but it converts it from MegaBytes -
> meaning anything under 1 MB will come down as 0 Bytes, meaning empty.
> I did a quick PR I thought would help:
> https://github.com/apache/hbase/pull/3737
> But it turns out it's not as easy as requesting the size in Bytes instead
> of MB from Size class, as we set it in MB te begin with in
> RegionMetricsBuilder
> -> setStoreFileSize(new Size(regionLoadPB.getStorefileSizeMB(),
> Size.Unit.MEGABYTE))
>
> I did some testing, and inserting a few kilobytes of data, then
> calling list_regions
> will in fact give back size 0.
>
> My question is, is it okay to store the region size in Bytes instead?
> Mainly asking because of backward compatibility reasons.
>
> Regards,
> Norbert
>

Re: RegionSize returning in MB - change to bytes?

Reply via email to