[jira] [Commented] (PHOENIX-1711) Improve performance of CSV loader

James Taylor (JIRA) Mon, 09 Mar 2015 09:01:46 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353145#comment-14353145
 ]


James Taylor commented on PHOENIX-1711:
---------------------------------------

Thanks for the review, [~gabriel.reid]. The patch is a bit on the raw side - 
just wanted to see if it makes a significant difference before cleaning it up. 
I've fixed the swallowing of that exception and gotten rid of the 
PArrayDataType change. I'll separate out the ConstraintViolationException 
change into a different change list. I'm thinking along the same lines as you - 
if it improves perf we can use this to speed up the generate case of UPSERT 
VALUES by caching the MutationPlan and continually re-executing it. It's 
possible that the CsvUpsertExecutor wouldn't need to change at all. I'm curious 
about one thing, though - do you think there's any overhead in calling 
CallRunner.run() per row versus once per batch?

> Improve performance of CSV loader
> ---------------------------------
>
>                 Key: PHOENIX-1711
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1711
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-1711.patch
>
>
> Here is a break-up of percentage execution time for some of the steps inthe 
> mapper:
> csvParser: 18%
> csvUpsertExecutor.execute(ImmutableList.of(csvRecord)): 39%
> PhoenixRuntime.getUncommittedDataIterator(conn, true): 9%
> while (uncommittedDataIterator.hasNext()): 15%
> Read IO & custom processing: 19%
> See details here: http://s.apache.org/6rl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1711) Improve performance of CSV loader

Reply via email to