[
https://issues.apache.org/jira/browse/ORC-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024711#comment-17024711
]
Panagiotis Garefalakis edited comment on ORC-593 at 1/27/20 10:20 PM:
----------------------------------------------------------------------
Hey [~ashutoshc] please check the latest PR – the previous one was based on
v1.5.8 thus the large number of commits.
The idea is to use the provided skipRows vector to avoid deserialising
expensive Types – depending on the type we need to update/move the byteBuffers
accordingly: e.g.,
[https://github.com/apache/orc/pull/474/files#diff-dcf15a871eb200f0fceaa924e14a01d4R1527]
was (Author: pgaref):
Hey [~ashutoshc] please check the latest PR – the previous one was based on
v1.5.8 thus the heavy number of commits.
The idea is to use the provided skipRows vector to avoid deserialising
expensive Types – depending on the type we need to update/move the byteBuffers
accordingly: e.g.,
[https://github.com/apache/orc/pull/474/files#diff-dcf15a871eb200f0fceaa924e14a01d4R1527]
> Allow row-level Skipping
> ------------------------
>
> Key: ORC-593
> URL: https://issues.apache.org/jira/browse/ORC-593
> Project: ORC
> Issue Type: Improvement
> Reporter: Panagiotis Garefalakis
> Priority: Major
> Fix For: 1.5.8, master
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Currently, ORC supports filtering at: File, Stripe, and row group level.
> There is an on-going effort to add more detailed row-level filters using
> filter Predicates as part of the Reader.Options as part of ORC-577.
> However, there are still cases where the framework implementing the
> TreeReader interface wants to skip particular rows without using Predicates
> (e.g., simply using indexes for rows to be skipped), to avoid expensive type
> Decode i.e DecimalColumnVector or Decimal64ColumnVector type.
> In this ticket I propose to support extend the TreeReader abstract class with
> an extra method next Vector method.
> {code:java}
> abstract void nextVector(ColumnVector previous,
> boolean[] isNull, boolean[] skipRows,
> final int batchSize){code}
> The subclasses implementing this method will be able to use the (existing)
> skipRows method to avoid expensive decoding when needed given the skipRows
> array argument.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)