Nick Dimiduk created HBASE-7743:
-----------------------------------

             Summary: KeyValueSortReducer and PutSortReducers buffer entire 
value-groups in memory
                 Key: HBASE-7743
                 URL: https://issues.apache.org/jira/browse/HBASE-7743
             Project: HBase
          Issue Type: Improvement
          Components: mapreduce
            Reporter: Nick Dimiduk


The mapreduce package provides two Reducer implementations, KeyValueSortReducer 
and PutSortReducer, which are used by Import, ImportTsv, and WALPlayer in 
conjunction with the HFileOutputFormat. Both of these implementations make use 
of a TreeSet to sort values matching a key. This reducer will OOM when rows are 
large.

A better solution would be to implement secondary sort of the values. That way 
hadoop sorts the records, spilling to disk when necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to