[
https://issues.apache.org/jira/browse/HBASE-24859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sandeep Pal updated HBASE-24859:
--------------------------------
Description:
It has been observed that when the table has too many regions, MR jobs consume
a lot of memory in the client. This is because we keep the region level
information in memory and the memory heavy object is TableSplit because of the
Scan object as a part of it.
However, it looks like the TableInputFormat for single table doesn't need to
store the scan object in the TableSplit because we do not use it and all the
splits are expected to have the exact same scan object. In TableInputFormat we
use the scan object directly from the MR conf.
was:
It has been observed that when the table has too many regions, MR jobs consume
more memory in the client. This is because we keep the region level information
in memory and the memory heavy object is TableSplit because of the Scan object
as a part of it.
We can optimize the memory consumption by not loading the region level
information if the region is empty based on the configuration.
The default configuration can lead to all TableSplits in memory (no change from
the current), but the configuration can enable the map-reduce job to ignore the
empty regions. The configuration can be a part of MR job based.
> Improve the storage cost for HBase map reduce table splits
> ----------------------------------------------------------
>
> Key: HBASE-24859
> URL: https://issues.apache.org/jira/browse/HBASE-24859
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Sandeep Pal
> Assignee: Sandeep Pal
> Priority: Major
> Attachments: Screen Shot 2020-08-26 at 8.44.34 AM.png, hbase-24859.png
>
>
> It has been observed that when the table has too many regions, MR jobs
> consume a lot of memory in the client. This is because we keep the region
> level information in memory and the memory heavy object is TableSplit because
> of the Scan object as a part of it.
> However, it looks like the TableInputFormat for single table doesn't need to
> store the scan object in the TableSplit because we do not use it and all the
> splits are expected to have the exact same scan object. In TableInputFormat
> we use the scan object directly from the MR conf.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)