[ 
https://issues.apache.org/jira/browse/ORC-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427972#comment-17427972
 ] 

qingbo jiao commented on ORC-1026:
----------------------------------

When we build the dictionary, we already have three streams of information. 
When the flushDictionary method is called, we traverse the dictionary again, 
which will reduce the efficiency of orc file writing. Are there any 
improvements here to remove this traversal?

> when write string type column,need to traversing the dictionary when 
> flushDictionary method is called,Is there anyway to remove this travesing
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ORC-1026
>                 URL: https://issues.apache.org/jira/browse/ORC-1026
>             Project: ORC
>          Issue Type: Improvement
>          Components: Java
>    Affects Versions: 1.8.0
>            Reporter: qingbo jiao
>            Priority: Major
>
> In the StringBaseTreeWriter.class when called flushDictionary() method,where 
> traversing the dictionary as show below
> {code:java}
> dictionary.visit(new Dictionary.Visitor() {
>   private int currentId = 0;
>   @Override
>   public void visit(Dictionary.VisitorContext context
>   ) throws IOException {
>     context.writeBytes(stringOutput);
>     lengthOutput.write(context.getLength());
>     dumpOrder[context.getOriginalPosition()] = currentId++;
>   }
> });
> {code}
> In the Impl class,we had some array to hold the information needed here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to