Nick Dimiduk created HBASE-7743:
-----------------------------------
Summary: KeyValueSortReducer and PutSortReducers buffer entire
value-groups in memory
Key: HBASE-7743
URL: https://issues.apache.org/jira/browse/HBASE-7743
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Nick Dimiduk
The mapreduce package provides two Reducer implementations, KeyValueSortReducer
and PutSortReducer, which are used by Import, ImportTsv, and WALPlayer in
conjunction with the HFileOutputFormat. Both of these implementations make use
of a TreeSet to sort values matching a key. This reducer will OOM when rows are
large.
A better solution would be to implement secondary sort of the values. That way
hadoop sorts the records, spilling to disk when necessary.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira