[ https://issues.apache.org/jira/browse/PHOENIX-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176190#comment-15176190 ]
Gabriel Reid commented on PHOENIX-1973: --------------------------------------- My personal preference would be to wait until the feedback is in before putting this in an RC, for the following reasons: * I think that stuff like fixing relatively minor things like this typically gets forgotten if we postpone it * I don't see this as really necessary for the release (i.e. it's an improvement, not fixing something that is broken) That being said, I don't feel particularly strongly about waiting with the patch -- the review feedback is code cleanup, not functionality/bug cleanup, and the improved performance certainly looks useful. In other words, it's ok either way with me, with a very slight preference for waiting. > Improve CsvBulkLoadTool performance by moving keyvalue construction from map > phase to reduce phase > -------------------------------------------------------------------------------------------------- > > Key: PHOENIX-1973 > URL: https://issues.apache.org/jira/browse/PHOENIX-1973 > Project: Phoenix > Issue Type: Improvement > Reporter: Rajeshbabu Chintaguntla > Assignee: Sergey Soldatov > Fix For: 4.7.0 > > Attachments: PHOENIX-1973-1.patch, PHOENIX-1973-2.patch, > PHOENIX-1973-3.patch, PHOENIX-1973-4.patch, PHOENIX-1973-5.patch, > PHOENIX-1973-6.patch, PHOENIX-1973-7.patch > > > It's similar to HBASE-8768. Only thing is we need to write custom mapper and > reducer in Phoenix. In Map phase we just need to get row key from primary key > columns and write the full text of a line as usual(to ensure sorting). In > reducer we need to get actual key values by running upsert query. > It's basically reduces lot of map output to write to disk and data need to be > transferred through network. -- This message was sent by Atlassian JIRA (v6.3.4#6332)