ndimiduk commented on code in PR #5700:
URL: https://github.com/apache/hbase/pull/5700#discussion_r1505999650


##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java:
##########
@@ -82,8 +81,8 @@ private void init(RegionLocator regionLocator, Admin admin) 
throws IOException {
         regionLocator.getName())) {
 
         byte[] regionId = regionLoad.getRegionName();
-        long regionSizeBytes =
-          ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * 
MEGABYTE;
+        long regionSizeBytes = (long) 
regionLoad.getMemStoreSize().get(Size.Unit.BYTE)

Review Comment:
   I think that Hadoop assumes these split sizes are in megabytes and so we 
follow suit.
   
   To protect against loss of precision, when the bytes-unit value is non-0, we 
can apply a minimum of 1mb. We should always add this minimum when running 
against an online-cluster.
   
   When running against a snapshot, I'm not sure. MR over snapshots 
instantiates the region in the mapper process -- I assume that also reads the 
WAL and populates a memstore. In that case, we need the 1mb minimum here too. 
If not, we can permit the 0 to pass through and give the empty split 
optomization a chance. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to