[
https://issues.apache.org/jira/browse/HBASE-26398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wellington Chevreuil resolved HBASE-26398.
------------------------------------------
Resolution: Fixed
Thanks for the contribution [~stoty]. Had merged into master, branch-2,
branch-2.4, branch-2.3 and branch-2.2.
> CellCounter fails for large tables filling up local disk
> --------------------------------------------------------
>
> Key: HBASE-26398
> URL: https://issues.apache.org/jira/browse/HBASE-26398
> Project: HBase
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 2.2.7, 2.5.0, 3.0.0-alpha-2, 2.3.7, 2.4.8
> Reporter: Istvan Toth
> Assignee: Istvan Toth
> Priority: Minor
> Fix For: 2.5.0, 2.2.8, 3.0.0-alpha-2, 2.3.8, 2.4.9
>
>
> CellCounter dumps all cell coordinates into its output, which can become huge.
> The spill can fill the local disk on the reducer.
> CellCounter hardcodes *mapreduce.job.reduces* to *1*, so it is not possible
> to use multiple reducers to get around this.
> Fixing this is easy, by not hardcoding *mapreduce.job.reduces*, it still
> defaults to 1, but can be overriden by the user.
> CellCounter also generates two extra records with constant keys for each
> cell, which have to be processed by the reducer.
> Even with multiple reducers, these (1/3 of the totcal records) will go the
> same reducer, which can also fill up the disk.
> This can be fixed by adding a Combiner to the Mapper, which sums the counter
> records, thereby reducing the Mapper output records to 1/3 of their previous
> amount, which can be evenly distibuted between the reducers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)