[GitHub] [hbase] sandeepvinayak commented on a change in pull request #2591: HBASE-24859: Optimize in-memory representation of HBase map reduce table splits

GitBox Wed, 28 Oct 2020 09:45:11 -0700


sandeepvinayak commented on a change in pull request #2591:
URL: https://github.com/apache/hbase/pull/2591#discussion_r513599178




##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
##########
@@ -323,7 +323,7 @@ public boolean nextKeyValue() throws IOException, 
InterruptedException {
       }
       List<InputSplit> splits = new ArrayList<>(1);
       long regionSize = 
sizeCalculator.getRegionSize(regLoc.getRegionInfo().getRegionName());
-      TableSplit split = new TableSplit(tableName, scan,
+      TableSplit split = new TableSplit(tableName,

Review comment:
       @saintstack that is correct! If you see the jira for description, there 
is a heap dump screenshots which shows the scan object may occupy much memory 
in case of tables with large number of regions. This patch just fix the 
TableInputFormat for single table where we don’t use the scan object from 
TableSplit since we use it from MR Job conf directly. There should be another 
patch to fix the similar fix with more code changes  for MultiTableInputFormat. 
I will try to fix that in a separate patch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] sandeepvinayak commented on a change in pull request #2591: HBASE-24859: Optimize in-memory representation of HBase map reduce table splits

Reply via email to