[ 
https://issues.apache.org/jira/browse/PHOENIX-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364596#comment-14364596
 ] 

James Taylor commented on PHOENIX-1711:
---------------------------------------

Some additional ideas I'll try to implement in the next patch:
- change MutationState to use a List instead of a Map at the top level. It's ok 
to have duplicate rows here, as they'll get folded together when we generate 
the List<Mutation>.
- change each mutation in the list to be a simple List<byte[]>. We can keep a 
pointer to the PTable and a List<int> of positions into the PTable columns 
instead of maintaining a Map for each row. Again, this will get folded together 
when we generate the List<Mutation>.
- we don't need to create Mutations for 
PhoenixRuntime.getUncommittedDataIterator() and it appears we don't need to 
sort (though we should verify that). Instead, we'll just generate a 
List<KeyValue> for each row in MutationState, allowing duplicate and 
out-of-order row keys.

Together with the original changes, this should be much closer to what 
[~tulasip] is doing in his standalone code.

> Improve performance of CSV loader
> ---------------------------------
>
>                 Key: PHOENIX-1711
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1711
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-1711.patch, PHOENIX-1711_4.0.patch
>
>
> Here is a break-up of percentage execution time for some of the steps inthe 
> mapper:
> csvParser: 18%
> csvUpsertExecutor.execute(ImmutableList.of(csvRecord)): 39%
> PhoenixRuntime.getUncommittedDataIterator(conn, true): 9%
> while (uncommittedDataIterator.hasNext()): 15%
> Read IO & custom processing: 19%
> See details here: http://s.apache.org/6rl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to