[jira] [Commented] (PHOENIX-1711) Improve performance of CSV loader

James Taylor (JIRA) Sun, 08 Mar 2015 22:59:21 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352586#comment-14352586
 ]


James Taylor commented on PHOENIX-1711:
---------------------------------------

The patch should minimize the work done by csvUpsertExecutor.execute().
The bulk of time in PhoenixRuntime.getUncommittedDataIterator(conn,true) is 
likely spent in sorting the KeyValues. If the CSV values are traversed in key 
order, it's possible that the sort step could be avoided.
The CSV parsing is done by the Apache Commons CSV project, so Phoenix doesn't 
control that.
The work in while (uncommittedDataIterator.hasNext()) is building the HFile, so 
Phoenix doesn't control that either (it's an HBase API).


> Improve performance of CSV loader
> ---------------------------------
>
>                 Key: PHOENIX-1711
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1711
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-1711.patch
>
>
> Here is a break-up of percentage execution time for some of the steps inthe 
> mapper:
> csvParser: 18%
> csvUpsertExecutor.execute(ImmutableList.of(csvRecord)): 39%
> PhoenixRuntime.getUncommittedDataIterator(conn, true): 9%
> while (uncommittedDataIterator.hasNext()): 15%
> Read IO & custom processing: 19%
> See details here: http://s.apache.org/6rl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1711) Improve performance of CSV loader

Reply via email to