[
https://issues.apache.org/jira/browse/ORC-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383350#comment-17383350
]
David Mollitor edited comment on ORC-830 at 7/19/21, 1:51 PM:
--------------------------------------------------------------
For the {{main}} branch, you can see that {{StringTreeWriter#writeBatch()}}
consumes 2.5% of the cycles, much of which is spent in {{getText()}}:
!Capture_StringHashTableAdd_Main.PNG!
In {{ORC-830}} branch you can see that {{StringTreeWriter}}{{#writeBatch()}}{{
consumes 1.6% of the cycles and the call to {{getText()}} does not even
register anymore:
!Capture_StringHashTableAdd_ORC830.PNG!
was (Author: belugabehr):
For the {{main}} branch, you can see that {{StringTreeWriter#writeBatch()}}
consumes 2.5% of the cycles, much of which is spent in {{getText()}}:
!Capture_StringHashTableAdd_Main.PNG!
In {{ORC-830}} branch you can see that
{{StringTreeWriter}}{{#writeBatch()}}{{}} consumes 1.6% of the cycles and the
call to {{getText()}} does not even register anymore:
!Capture_StringHashTableAdd_ORC830.PNG!
> Do Not Copy String When Adding to StringHashTableDictionary
> -----------------------------------------------------------
>
> Key: ORC-830
> URL: https://issues.apache.org/jira/browse/ORC-830
> Project: ORC
> Issue Type: Improvement
> Components: Java
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
> Attachments: Capture_StringHashTableAdd_Main.PNG,
> Capture_StringHashTableAdd_ORC830.PNG
>
>
> {code:java|title=StringHashTableDictionary.java}
> Text tmpText = new Text();
> for (int i = 0; i < candidateArray.size(); i++) {
> getText(tmpText, candidateArray.get(i));
> if (tmpText.equals(newKey)) {
> return candidateArray.get(i);
> }
> }
> {code}
> When there is a collision adding a value into a
> {{StringHashTableDictionary}}, a temp {{Text}} object is created and then
> each value in the byte array is copied into the temp {{Text}} until a match
> is found (or worst-case scenario, a match is not found and every value is
> loaded).
> Instead of loading (copying) the values, just compare directly against the
> byte array without copying the data into a intermediate (temp) buffer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)