[
https://issues.apache.org/jira/browse/ORC-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548198#comment-16548198
]
Sergey Shelukhin commented on ORC-378:
--------------------------------------
The change is to assume the vector is repeating and populate repeating based on
ORC encodings that encode repeating values without going thru the step of
populating the literals, then comparing them all to each other and setting
repeating.
This is intended for columns with very low number of distinct values, e.g. ACID
structure columns, but also some typical data columns that have repeated values.
So far the only testing I've done is lack of obvious regression on standard
benchmarks in orc-bench.
Still need to test more.
I was thinking that to avoid potentially affecting the main path we may
separate this into a reader that will be used only in some cases, based on
statistics.
> translate ShortRepeat/Delta integer encoding into isRepeating on LongCV more
> directly
> -------------------------------------------------------------------------------------
>
> Key: ORC-378
> URL: https://issues.apache.org/jira/browse/ORC-378
> Project: ORC
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Major
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)