[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov reopened PHOENIX-2649:
--------------------------------------

After loading my table with the index I found that the solution is not correct. 
We still need to  check  table names and row keys separately. Otherwise the 
length of the row key will be taken in consideration and can lead to 
unpredictable results. We also should not use BytesWritable to compare bytes. 
WritableComparator.compare is the right way. 

> GC/OOM during BulkLoad
> ----------------------
>
>                 Key: PHOENIX-2649
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2649
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>            Reporter: Sergey Soldatov
>            Assignee: maghamravikiran
>            Priority: Critical
>             Fix For: 4.7.0
>
>         Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to