[ 
https://issues.apache.org/jira/browse/ORC-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548198#comment-16548198
 ] 

Sergey Shelukhin commented on ORC-378:
--------------------------------------

The change is to assume the vector is repeating and populate repeating based on 
ORC encodings that encode repeating values without going thru the step of 
populating the literals, then comparing them all to each other and setting 
repeating.
This is intended for columns with very low number of distinct values, e.g. ACID 
structure columns, but also some typical data columns that have repeated values.
So far the only testing I've done is lack of obvious regression on standard 
benchmarks in orc-bench.
Still need to test more.

I was thinking that to avoid potentially affecting the main path we may 
separate this into a reader that will be used only in some cases, based on 
statistics.

> translate ShortRepeat/Delta integer encoding into isRepeating on LongCV more 
> directly
> -------------------------------------------------------------------------------------
>
>                 Key: ORC-378
>                 URL: https://issues.apache.org/jira/browse/ORC-378
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to