[jira] [Commented] (PHOENIX-1973) Improve CsvBulkLoadTool performance by moving keyvalue construction from map phase to reduce phase

Gabriel Reid (JIRA) Wed, 02 Mar 2016 10:33:36 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176190#comment-15176190
 ]


Gabriel Reid commented on PHOENIX-1973:
---------------------------------------

My personal preference would be to wait until the feedback is in before putting 
this in an RC, for the following reasons:
* I think that stuff like fixing relatively minor things like this typically 
gets forgotten if we postpone it
* I don't see this as really necessary for the release (i.e. it's an 
improvement, not fixing something that is broken)

That being said, I don't feel particularly strongly about waiting with the 
patch -- the review feedback is code cleanup, not functionality/bug cleanup, 
and the improved performance certainly looks useful.

In other words, it's ok either way with me, with a very slight preference for 
waiting.

> Improve CsvBulkLoadTool performance by moving keyvalue construction from map 
> phase to reduce phase
> --------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1973
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1973
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Sergey Soldatov
>             Fix For: 4.7.0
>
>         Attachments: PHOENIX-1973-1.patch, PHOENIX-1973-2.patch, 
> PHOENIX-1973-3.patch, PHOENIX-1973-4.patch, PHOENIX-1973-5.patch, 
> PHOENIX-1973-6.patch, PHOENIX-1973-7.patch
>
>
> It's similar to HBASE-8768. Only thing is we need to write custom mapper and 
> reducer in Phoenix. In Map phase we just need to get row key from primary key 
> columns and write the full text of a line as usual(to ensure sorting). In 
> reducer we need to get actual key values by running upsert query.
> It's basically reduces lot of map output to write to disk and data need to be 
> transferred through network.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1973) Improve CsvBulkLoadTool performance by moving keyvalue construction from map phase to reduce phase

Reply via email to