[ 
https://issues.apache.org/jira/browse/HBASE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587512#comment-13587512
 ] 

Nick Dimiduk commented on HBASE-7747:
-------------------------------------

The grammer of that comment is terrible! Someone might expect better of me. 
What I mean to say is:

bq. TODO: There's nothing to say Puts (values) are keyed on rowkey. Thus the 
map of put.getRow() to combined Put is necessary. Could use HeapSize to create 
an upper bound on the memory size of the puts map and flush some portion of the 
content. This is acceptable because Combiner is run an unspecified number of 
times and is for optimization only.

We could apply further constraint on this implementation by requiring Keys be 
the rowkey used in the Puts. In that case, the puts map is unnecessary.

The higher objective is for a MR job to create a single Put per row. This 
avoids the row-level contention on write you see when writing wide/sparse table 
schema.
                
> Import tools should use a combiner to merge Puts
> ------------------------------------------------
>
>                 Key: HBASE-7747
>                 URL: https://issues.apache.org/jira/browse/HBASE-7747
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce, Performance
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>            Priority: Minor
>             Fix For: 0.95.0
>
>         Attachments: 
> 0001-HBASE-7747-Import-use-a-Put-combiner-where-possible.patch
>
>
> Multiple Puts to the same row should be combined into a single mutation 
> object. This can be done with a Combiner. Import.Importer#writeResult appears 
> to do this manually.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to