[jira] [Updated] (PHOENIX-1973) Improve CsvBulkLoadTool performance by moving keyvalue construction from map phase to reduce phase

Sergey Soldatov (JIRA) Tue, 02 Feb 2016 04:59:25 -0800

     [ 
https://issues.apache.org/jira/browse/PHOENIX-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sergey Soldatov updated PHOENIX-1973:
-------------------------------------
    Attachment: PHOENIX-1973-1.patch

At the moment mapper generate pairs for both table and index table and the 
rowkey for index table is based on the values generated for the parent table. 
As an alternative we can pack the only values from KVs with the same rowkey in 
a single array during mapping and reconstruct them in Reducer. Patch is a basic 
implementation of this approach.   

> Improve CsvBulkLoadTool performance by moving keyvalue construction from map 
> phase to reduce phase
> --------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1973
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1973
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>             Fix For: 4.4.1
>
>         Attachments: PHOENIX-1973-1.patch
>
>
> It's similar to HBASE-8768. Only thing is we need to write custom mapper and 
> reducer in Phoenix. In Map phase we just need to get row key from primary key 
> columns and write the full text of a line as usual(to ensure sorting). In 
> reducer we need to get actual key values by running upsert query.
> It's basically reduces lot of map output to write to disk and data need to be 
> transferred through network.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PHOENIX-1973) Improve CsvBulkLoadTool performance by moving keyvalue construction from map phase to reduce phase

Reply via email to