[ 
https://issues.apache.org/jira/browse/PHOENIX-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129983#comment-15129983
 ] 

Gabriel Reid commented on PHOENIX-1973:
---------------------------------------

Thanks for those numbers [~enis]. Doing this makes a lot of sense.

I would think that the biggest reasons that there is such a difference between 
the map output and the final HFile output is that block encoding (and probably 
compression) are used on the final output HFiles, but block encoding is 
certainly not used in the intermediate output. Any idea if map output 
compression was enabled for the MR job used in your internal test?



> Improve CsvBulkLoadTool performance by moving keyvalue construction from map 
> phase to reduce phase
> --------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1973
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1973
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Sergey Soldatov
>             Fix For: 4.4.1
>
>         Attachments: PHOENIX-1973-1.patch
>
>
> It's similar to HBASE-8768. Only thing is we need to write custom mapper and 
> reducer in Phoenix. In Map phase we just need to get row key from primary key 
> columns and write the full text of a line as usual(to ensure sorting). In 
> reducer we need to get actual key values by running upsert query.
> It's basically reduces lot of map output to write to disk and data need to be 
> transferred through network.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to