[
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170645#comment-15170645
]
maghamravikiran edited comment on PHOENIX-2649 at 2/27/16 4:29 PM:
-------------------------------------------------------------------
To me it looks like the issue is in this code snippet in [#1] where the mapper
output key of TableRowkeyPair includes a table index and rowkey rather than
table name and rowkey.
While creating the partitioner path [#2] during the job setup , we apparently
use TableRowkeyPair which is a combination of table name and rowkey of the
table.
This mismatch seems to be the root cause of the issue and the
TotalOrderPartitioner is distributing all mapper output to a single reducer
1.
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274
2.
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707
The initial code drop of PHOENIX-2216 didn't introduce this issue.
was (Author: [email protected]):
To me it looks like the issue is in this code snippet in [#1] where the mapper
output key of TableRowkeyPair includes a table index and rowkey rather than
table name and rowkey.
While creating the partitioner path [#2] during the job setup , we apparently
use TableRowkeyPair which is a combination of table name and rowkey of the
table.
This mismatch seems to be the root cause of the issue and the
TotalOrderPartitioner is distributing all mapper output to a single reducer
1.
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274
2.
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707
> GC/OOM during BulkLoad
> ----------------------
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
> Reporter: Sergey Soldatov
> Assignee: Sergey Soldatov
> Priority: Critical
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch,
> PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete bulk load of 40Mb csv data with GC heap error
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It
> expects that the serialized value was written using zero-compressed encoding,
> but at least in my case it was written in regular way. So, trying to obtain
> length for table name and row key it always get zero and reports that those
> byte arrays are equal. As the result, the reducer receives all data produced
> by mappers in one reduce call and fails with OOM.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)