[ https://issues.apache.org/jira/browse/PHOENIX-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364596#comment-14364596 ]
James Taylor commented on PHOENIX-1711: --------------------------------------- Some additional ideas I'll try to implement in the next patch: - change MutationState to use a List instead of a Map at the top level. It's ok to have duplicate rows here, as they'll get folded together when we generate the List<Mutation>. - change each mutation in the list to be a simple List<byte[]>. We can keep a pointer to the PTable and a List<int> of positions into the PTable columns instead of maintaining a Map for each row. Again, this will get folded together when we generate the List<Mutation>. - we don't need to create Mutations for PhoenixRuntime.getUncommittedDataIterator() and it appears we don't need to sort (though we should verify that). Instead, we'll just generate a List<KeyValue> for each row in MutationState, allowing duplicate and out-of-order row keys. Together with the original changes, this should be much closer to what [~tulasip] is doing in his standalone code. > Improve performance of CSV loader > --------------------------------- > > Key: PHOENIX-1711 > URL: https://issues.apache.org/jira/browse/PHOENIX-1711 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Attachments: PHOENIX-1711.patch, PHOENIX-1711_4.0.patch > > > Here is a break-up of percentage execution time for some of the steps inthe > mapper: > csvParser: 18% > csvUpsertExecutor.execute(ImmutableList.of(csvRecord)): 39% > PhoenixRuntime.getUncommittedDataIterator(conn, true): 9% > while (uncommittedDataIterator.hasNext()): 15% > Read IO & custom processing: 19% > See details here: http://s.apache.org/6rl -- This message was sent by Atlassian JIRA (v6.3.4#6332)