[ 
https://issues.apache.org/jira/browse/ORC-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated ORC-593:
---------------------------------------
    Description: 
Currently, ORC supports filtering at: File, Stripe, and row group level.

There is an on-going effort to add more detailed row-level filters using filter 
Predicates as part of the Reader.Options as part of ORC-577.

However, there are still cases where the framework implementing the TreeReader 
interface wants to skip particular rows without using Predicates, to avoid 
expensive type Decode i.e DecimalColumnVector or Decimal64ColumnVector type.

In this ticket I propose to support extend the TreeReader abstract class with 
an extra method next Vector method.
{code:java}
abstract void nextVector(ColumnVector previous,
 boolean[] isNull, boolean[] skipRows,
 final int batchSize){code}
The subclasses implementing this method will be able to use the (existing) 
skipRows method to avoid expensive decoding when needed.

  was:
Currently, ORC supports filtering at: File, Stripe, and row group level.

There is an on-going effort to add more detailed row-level filters using filter 
Predicates as part of the Reader.Options as part of ORC-577.

However, there are still cases where the framework implementing the TreeReader 
interface wants to skip particular rows to avoid expensive type Decode i.e 
DecimalColumnVector or Decimal64ColumnVector type.

In this ticket I propose to support extend the TreeReader abstract class with 
an extra method next Vector method.
{code:java}
abstract void nextVector(ColumnVector previous,
 boolean[] isNull, boolean[] skipRows,
 final int batchSize){code}
The subclasses implementing this method will be able to use the (existing) 
skipRows method to avoid expensive decoding when needed.


> Allow row-level Skipping
> ------------------------
>
>                 Key: ORC-593
>                 URL: https://issues.apache.org/jira/browse/ORC-593
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Panagiotis Garefalakis
>            Priority: Major
>             Fix For: 1.5.8, master
>
>
> Currently, ORC supports filtering at: File, Stripe, and row group level.
> There is an on-going effort to add more detailed row-level filters using 
> filter Predicates as part of the Reader.Options as part of ORC-577.
> However, there are still cases where the framework implementing the 
> TreeReader interface wants to skip particular rows without using Predicates, 
> to avoid expensive type Decode i.e DecimalColumnVector or 
> Decimal64ColumnVector type.
> In this ticket I propose to support extend the TreeReader abstract class with 
> an extra method next Vector method.
> {code:java}
> abstract void nextVector(ColumnVector previous,
>  boolean[] isNull, boolean[] skipRows,
>  final int batchSize){code}
> The subclasses implementing this method will be able to use the (existing) 
> skipRows method to avoid expensive decoding when needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to