[jira] [Updated] (HBASE-26340) TableSplit returns false size under 1MB

Jira Fri, 08 Oct 2021 02:57:05 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-26340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Norbert Kalmár updated HBASE-26340:
-----------------------------------
    Description: 
We calculate region size in the mapreduce package by getting the size in MB 
first and multiplying: 
https://github.com/apache/hbase/blob/39a20c528e2bf27cedf12734dbdb1b7b1e538076/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java#L87

This will give a size of 0 until at least 1MB is reached. (And it will have an 
unwanted rounding affect as well). 
Spark for example can be tuned to do some performance tuning by eliminating the 
0 sized regions. This will eliminate any small regions which are not actually 
empty. The hadoop interface states the size is returned in bytes, and while 
this is true do to the multiplication, we multiply by 0 until 1MB is reached. 
I'm not sure why we get the size in MB units and not in bytes straight up.

  was:
We calculate region size in the mapreduce package by getting the size in MB 
first and multiplying: 
https://github.com/apache/hbase/blob/39a20c528e2bf27cedf12734dbdb1b7b1e538076/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java#L87

This will give a size of 0 until at least 1MB is reached. (And it will have an 
unwanted rounding affect as well). 
Spark for example can be tuned to do some performance tuning by eliminating the 
0 sized regions. This will eliminate any small regions which are not actually 
empty. The hadoop interface states the size is returned in bytes, and while 
this is true do to the multiplication, we multiply by 0 until 1MB is reached. 
I'm not sure why we get the size in MB units and not in bytes straight up. 
Should we fix this?


> TableSplit returns false size under 1MB
> ---------------------------------------
>
>                 Key: HBASE-26340
>                 URL: https://issues.apache.org/jira/browse/HBASE-26340
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Norbert Kalmár
>            Assignee: Norbert Kalmár
>            Priority: Major
>
> We calculate region size in the mapreduce package by getting the size in MB 
> first and multiplying: 
> https://github.com/apache/hbase/blob/39a20c528e2bf27cedf12734dbdb1b7b1e538076/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java#L87
> This will give a size of 0 until at least 1MB is reached. (And it will have 
> an unwanted rounding affect as well). 
> Spark for example can be tuned to do some performance tuning by eliminating 
> the 0 sized regions. This will eliminate any small regions which are not 
> actually empty. The hadoop interface states the size is returned in bytes, 
> and while this is true do to the multiplication, we multiply by 0 until 1MB 
> is reached. I'm not sure why we get the size in MB units and not in bytes 
> straight up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-26340) TableSplit returns false size under 1MB

Reply via email to