[jira] [Commented] (HBASE-26398) CellCounter fails for large tables filling up local disk

Hudson (Jira) Thu, 28 Oct 2021 19:37:21 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-26398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435746#comment-17435746
 ]


Hudson commented on HBASE-26398:
--------------------------------

Results for branch branch-2.4
        [build #226 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/226/]:
 (/) *{color:green}+1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/226/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/226/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/226/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/226/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> CellCounter fails for large tables filling up local disk
> --------------------------------------------------------
>
>                 Key: HBASE-26398
>                 URL: https://issues.apache.org/jira/browse/HBASE-26398
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 2.2.7, 2.5.0, 3.0.0-alpha-2, 2.3.7, 2.4.8
>            Reporter: Istvan Toth
>            Assignee: Istvan Toth
>            Priority: Minor
>             Fix For: 2.5.0, 2.2.8, 3.0.0-alpha-2, 2.3.8, 2.4.9
>
>
> CellCounter dumps all cell coordinates into its output, which can become huge.
> The spill can fill the local disk on the reducer. 
> CellCounter hardcodes *mapreduce.job.reduces* to *1*, so it is not possible 
> to use multiple reducers to get around this.
> Fixing this is easy, by not hardcoding *mapreduce.job.reduces*, it still 
> defaults to 1, but can be overriden by the user. 
> CellCounter also generates two extra records with constant keys for each 
> cell, which have to be processed by the reducer.
> Even with multiple reducers, these (1/3 of the totcal records) will go the 
> same reducer, which can also fill up the disk.
> This can be fixed by adding a Combiner to the Mapper, which sums the counter 
> records, thereby reducing the Mapper output records to 1/3 of their previous 
> amount, which can be evenly distibuted between the reducers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-26398) CellCounter fails for large tables filling up local disk

Reply via email to