Re: [PR] HBASE-28399 region size can be wrong from RegionSizeCalculator [hbase]

via GitHub Wed, 28 Feb 2024 08:58:50 -0800


frostruan commented on code in PR #5700:
URL: https://github.com/apache/hbase/pull/5700#discussion_r1506288550



##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java:
##########
@@ -82,8 +81,8 @@ private void init(RegionLocator regionLocator, Admin admin) 
throws IOException {
         regionLocator.getName())) {
 
         byte[] regionId = regionLoad.getRegionName();
-        long regionSizeBytes =
-          ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * 
MEGABYTE;
+        long regionSizeBytes = (long) 
regionLoad.getMemStoreSize().get(Size.Unit.BYTE)

Review Comment:
   > To protect against loss of precision, when the bytes-unit value is non-0, 
we can apply a minimum of 1mb. We should always add this minimum when running 
against an online-cluster.
   
   Agree. And I believe this problem has been solved in 
https://issues.apache.org/jira/browse/HBASE-26609
   
   > When running against a snapshot, I'm not sure. MR over snapshots 
instantiates the region in the mapper process -- I assume that also reads the 
WAL and populates a memstore. In that case, we need the 1mb minimum here too. 
If not, we can permit the 0 to pass through and give the empty split 
optomization a chance.
   
   The value of snapshot input split length will always be 0. 
   
https://github.com/apache/hbase/blob/rel/3.0.0-beta-1/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L185
   I think maybe this should be increased to 1MB too ? 
   



##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java:
##########
@@ -82,8 +81,8 @@ private void init(RegionLocator regionLocator, Admin admin) 
throws IOException {
         regionLocator.getName())) {
 
         byte[] regionId = regionLoad.getRegionName();
-        long regionSizeBytes =
-          ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * 
MEGABYTE;
+        long regionSizeBytes = (long) 
regionLoad.getMemStoreSize().get(Size.Unit.BYTE)

Review Comment:
   > To protect against loss of precision, when the bytes-unit value is non-0, 
we can apply a minimum of 1mb. We should always add this minimum when running 
against an online-cluster.
   
   Agree. And I believe this problem has been solved in 
https://issues.apache.org/jira/browse/HBASE-26609
   
   > When running against a snapshot, I'm not sure. MR over snapshots 
instantiates the region in the mapper process -- I assume that also reads the 
WAL and populates a memstore. In that case, we need the 1mb minimum here too. 
If not, we can permit the 0 to pass through and give the empty split 
optomization a chance.
   
   The value of snapshot input split length will always be 0. 
   
https://github.com/apache/hbase/blob/rel/3.0.0-beta-1/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L185
   I think maybe this should be increased to 1MB too ? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28399 region size can be wrong from RegionSizeCalculator [hbase]

Reply via email to