[ 
https://issues.apache.org/jira/browse/HBASE-28399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820061#comment-17820061
 ] 

Bryan Beaudreault commented on HBASE-28399:
-------------------------------------------

The other reason the value could be 0 is if the region happens to be in 
transition when the RegionSizeCalculator runs. In which case, the ServerName 
will be null from the RegionLocator, and we'll not fetch any region size from 
RegionMetrics. Currently this throws an NPE, but in HBASE-28354 we add a null 
check so would return 0.

> region size can be wrong from RegionSizeCalculator
> --------------------------------------------------
>
>                 Key: HBASE-28399
>                 URL: https://issues.apache.org/jira/browse/HBASE-28399
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 3.0.0-beta-1
>            Reporter: ruanhui
>            Assignee: ruanhui
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.0.0-beta-2
>
>
> The RegionSizeCalculator calculates region byte size using the following 
> method
> {code:java}
> private static final long MEGABYTE = 1024L * 1024L;
> long regionSizeBytes =
>   ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE; 
> {code}
> However, this method will lose accuracy. For example, the result of 
> {code:java}
> ((long) new Size(1, Size.Unit.BYTE).get(Size.Unit.MEGABYTE)) * MEGABYTE {code}
> is 0. This will result in a TableInputSplit with a length of 0, but in fact 
> this TableInputSplit has a small amount of data.
>  
> This TableInputSplit will be ignored if we enable 
> spark.hadoopRDD.ignoreEmptySplits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to