[
https://issues.apache.org/jira/browse/ORC-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Panagiotis Garefalakis updated ORC-593:
---------------------------------------
Description:
Currently, ORC supports filtering at: File, Stripe, and row group level.
There is an on-going effort to add more detailed row-level filters using filter
Predicates as part of the Reader.Options as part of ORC-577.
However, there are still cases where the framework implementing the TreeReader
interface wants to skip particular rows without using Predicates (e.g., simply
using indexes for rows to be skipped), to avoid expensive type Decode i.e
DecimalColumnVector or Decimal64ColumnVector type.
In this ticket I propose to support extend the TreeReader abstract class with
an extra method next Vector method.
{code:java}
abstract void nextVector(ColumnVector previous,
boolean[] isNull, boolean[] skipRows,
final int batchSize){code}
The subclasses implementing this method will be able to use the (existing)
skipRows method to avoid expensive decoding when needed given the skipRows
array argument.
was:
Currently, ORC supports filtering at: File, Stripe, and row group level.
There is an on-going effort to add more detailed row-level filters using filter
Predicates as part of the Reader.Options as part of ORC-577.
However, there are still cases where the framework implementing the TreeReader
interface wants to skip particular rows without using Predicates, to avoid
expensive type Decode i.e DecimalColumnVector or Decimal64ColumnVector type.
In this ticket I propose to support extend the TreeReader abstract class with
an extra method next Vector method.
{code:java}
abstract void nextVector(ColumnVector previous,
boolean[] isNull, boolean[] skipRows,
final int batchSize){code}
The subclasses implementing this method will be able to use the (existing)
skipRows method to avoid expensive decoding when needed given the skipRows
array argument.
> Allow row-level Skipping
> ------------------------
>
> Key: ORC-593
> URL: https://issues.apache.org/jira/browse/ORC-593
> Project: ORC
> Issue Type: Improvement
> Reporter: Panagiotis Garefalakis
> Priority: Major
> Fix For: 1.5.8, master
>
>
> Currently, ORC supports filtering at: File, Stripe, and row group level.
> There is an on-going effort to add more detailed row-level filters using
> filter Predicates as part of the Reader.Options as part of ORC-577.
> However, there are still cases where the framework implementing the
> TreeReader interface wants to skip particular rows without using Predicates
> (e.g., simply using indexes for rows to be skipped), to avoid expensive type
> Decode i.e DecimalColumnVector or Decimal64ColumnVector type.
> In this ticket I propose to support extend the TreeReader abstract class with
> an extra method next Vector method.
> {code:java}
> abstract void nextVector(ColumnVector previous,
> boolean[] isNull, boolean[] skipRows,
> final int batchSize){code}
> The subclasses implementing this method will be able to use the (existing)
> skipRows method to avoid expensive decoding when needed given the skipRows
> array argument.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)