[ 
https://issues.apache.org/jira/browse/ORC-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513250#comment-16513250
 ] 

Sergey Shelukhin edited comment on ORC-378 at 6/15/18 2:41 AM:
---------------------------------------------------------------

Changed one method for now. 
May need to change other methods; although int one seems to be unused and was 
different from the long one.
I wonder if this is too branched and we need to give up easier on more paths 
and just go to the old loop path.

This basically stops using next, and instead alternates between calling 
read...Values (into literals) and consumeLiterals. One shortcut is that if 
values are repeating for ORC repeating encodings (short-repeat and delta=0), we 
don't read them into literals and don't compare values to each other, but just 
keep reading and only checking the single value from the ORC encoding 
(firstVal/delta) until we have enough or we find out they are no longer 
repeating. 

Also some logic there doesn't make sense to me. 
isRepeating+isNull[0] is handled in the beginning. Wouldn't it mean that 
noNulls == false automatically means isRepeating should be false? Cause if 
noNulls false is set correctly (there are in fact nulls) and repeating is set 
correctly (so not all values are nulls as per the initial check) the vector 
cannot possibly be repeating. [~prasanth_j] should we remove the related paths? 
That would simplify it a bit. Or is it handling a poorly setup incoming vector?

cc [~gopalv] [~prasanth_j]


was (Author: sershe):
Changed one method for now. 
May need to change other methods; although int one seems to be unused and was 
different from the long one.
I wonder if this is too branched and we need to give up easier on more paths 
and just go to the old loop path.

This basically stops using next, and instead alternates between calling 
read...Values (into literals) and consumeLiterals. One shortcut is that if 
values are repeating for ORC repeating encodings (short-repeat and delta=0), we 
don't read them into literals and don't compare values to each other, but just 
keep reading and only checking the single value from the ORC encoding 
(firstVal/delta) until we have enough or we find out they are no longer 
repeating. 

cc [~gopalv] [~prasanth_j]

> translate ShortRepeat/Delta integer encoding into isRepeating on LongCV more 
> directly
> -------------------------------------------------------------------------------------
>
>                 Key: ORC-378
>                 URL: https://issues.apache.org/jira/browse/ORC-378
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to