[
https://issues.apache.org/jira/browse/HBASE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587512#comment-13587512
]
Nick Dimiduk commented on HBASE-7747:
-------------------------------------
The grammer of that comment is terrible! Someone might expect better of me.
What I mean to say is:
bq. TODO: There's nothing to say Puts (values) are keyed on rowkey. Thus the
map of put.getRow() to combined Put is necessary. Could use HeapSize to create
an upper bound on the memory size of the puts map and flush some portion of the
content. This is acceptable because Combiner is run an unspecified number of
times and is for optimization only.
We could apply further constraint on this implementation by requiring Keys be
the rowkey used in the Puts. In that case, the puts map is unnecessary.
The higher objective is for a MR job to create a single Put per row. This
avoids the row-level contention on write you see when writing wide/sparse table
schema.
> Import tools should use a combiner to merge Puts
> ------------------------------------------------
>
> Key: HBASE-7747
> URL: https://issues.apache.org/jira/browse/HBASE-7747
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce, Performance
> Reporter: Nick Dimiduk
> Assignee: Nick Dimiduk
> Priority: Minor
> Fix For: 0.95.0
>
> Attachments:
> 0001-HBASE-7747-Import-use-a-Put-combiner-where-possible.patch
>
>
> Multiple Puts to the same row should be combined into a single mutation
> object. This can be done with a Combiner. Import.Importer#writeResult appears
> to do this manually.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira