frostruan commented on code in PR #5700:
URL: https://github.com/apache/hbase/pull/5700#discussion_r1506288550
##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java:
##########
@@ -82,8 +81,8 @@ private void init(RegionLocator regionLocator, Admin admin)
throws IOException {
regionLocator.getName())) {
byte[] regionId = regionLoad.getRegionName();
- long regionSizeBytes =
- ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) *
MEGABYTE;
+ long regionSizeBytes = (long)
regionLoad.getMemStoreSize().get(Size.Unit.BYTE)
Review Comment:
> To protect against loss of precision, when the bytes-unit value is non-0,
we can apply a minimum of 1mb. We should always add this minimum when running
against an online-cluster.
Agree. And I believe this problem has been solved in
https://issues.apache.org/jira/browse/HBASE-26609
> When running against a snapshot, I'm not sure. MR over snapshots
instantiates the region in the mapper process -- I assume that also reads the
WAL and populates a memstore. In that case, we need the 1mb minimum here too.
If not, we can permit the 0 to pass through and give the empty split
optomization a chance.
The value of snapshot input split length will always be 0.
https://github.com/apache/hbase/blob/rel/3.0.0-beta-1/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L185
I think maybe this should be increased to 1MB too ?
##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java:
##########
@@ -82,8 +81,8 @@ private void init(RegionLocator regionLocator, Admin admin)
throws IOException {
regionLocator.getName())) {
byte[] regionId = regionLoad.getRegionName();
- long regionSizeBytes =
- ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) *
MEGABYTE;
+ long regionSizeBytes = (long)
regionLoad.getMemStoreSize().get(Size.Unit.BYTE)
Review Comment:
> To protect against loss of precision, when the bytes-unit value is non-0,
we can apply a minimum of 1mb. We should always add this minimum when running
against an online-cluster.
Agree. And I believe this problem has been solved in
https://issues.apache.org/jira/browse/HBASE-26609
> When running against a snapshot, I'm not sure. MR over snapshots
instantiates the region in the mapper process -- I assume that also reads the
WAL and populates a memstore. In that case, we need the 1mb minimum here too.
If not, we can permit the 0 to pass through and give the empty split
optomization a chance.
The value of snapshot input split length will always be 0.
https://github.com/apache/hbase/blob/rel/3.0.0-beta-1/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L185
I think maybe this should be increased to 1MB too ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]