[jira] [Commented] (PHOENIX-1711) Improve performance of CSV loader

Gabriel Reid (JIRA) Mon, 09 Mar 2015 09:45:46 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353185#comment-14353185
 ]


Gabriel Reid commented on PHOENIX-1711:
---------------------------------------

{quote}
I'm curious about one thing, though - do you think there's any overhead in 
calling CallRunner.run() per row versus once per batch?
{quote}

I would assume that it's lightweight enough that it won't make a measurable 
difference, but that's just my guess.

Actually, I was curious about why that the call to CallRunner.run was needed at 
all in the map method of CSVUpsertExecutor -- at least when it's being used 
within MapReduce, it should be fine to just set the context classloader once 
(which is what CallRunner does, from what I see). From what I recall, the 
reason for having the CallRunner system is to "fix" the context classloader 
when running inside of JDBC tooling that dynamically loads Phoenix, but I 
wouldn't think that there would be ever be a problem when running via the bulk 
loader.

> Improve performance of CSV loader
> ---------------------------------
>
>                 Key: PHOENIX-1711
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1711
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-1711.patch
>
>
> Here is a break-up of percentage execution time for some of the steps inthe 
> mapper:
> csvParser: 18%
> csvUpsertExecutor.execute(ImmutableList.of(csvRecord)): 39%
> PhoenixRuntime.getUncommittedDataIterator(conn, true): 9%
> while (uncommittedDataIterator.hasNext()): 15%
> Read IO & custom processing: 19%
> See details here: http://s.apache.org/6rl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1711) Improve performance of CSV loader

Reply via email to