[jira] [Updated] (ORC-593) Allow row level Skipping

Panagiotis Garefalakis (Jira) Mon, 27 Jan 2020 08:58:25 -0800


     [ 
https://issues.apache.org/jira/browse/ORC-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Panagiotis Garefalakis updated ORC-593:
---------------------------------------
    Description: 
Currently, ORC supports filtering at: File, Stripe, and row group level.

There is an on-going effort to add more detailed row-level filters using filter 
Predicates as part of the Reader.Options as part of 
[ORC-577|https://issues.apache.org/jira/browse/ORC-577].

However, there are still cases where the framework implementing the TreeReader 
interface wants to skip particular rows to avoid expensive type Decode i.e 
DecimalColumnVector or Decimal64ColumnVector type.

In this ticket I propose to support extend the TreeReader abstract class with 
an extra method next Vector method.
{code:java}
abstract void nextVector(ColumnVector previous,
 boolean[] isNull, boolean[] skipRows,
 final int batchSize){code}
The subclasses implementing this method will be able to use the skipRows method 
to avoid expensive decoding when needed.

  was:
Currently, ORC supports filtering at: File, Stripe, and row group level.

There is an on-going effort to add more detailed row-level filters using filter 
Predicates as part of the Reader.Options as part of [#ORC-577].

However, there are still cases where the framework implementing the TreeReader 
interface wants to skip particular rows to avoid expensive type Decode i.e 
DecimalColumnVector or Decimal64ColumnVector type.

In this ticket I propose to support extend the TreeReader abstract class with 
an extra method next Vector method.
{code:java}
abstract void nextVector(ColumnVector previous,
 boolean[] isNull, boolean[] skipRows,
 final int batchSize){code}
The subclasses implementing this method will be able to use the skipRows method 
to avoid expensive decoding when needed.


> Allow row level Skipping
> ------------------------
>
>                 Key: ORC-593
>                 URL: https://issues.apache.org/jira/browse/ORC-593
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Panagiotis Garefalakis
>            Priority: Major
>             Fix For: 1.5.8
>
>
> Currently, ORC supports filtering at: File, Stripe, and row group level.
> There is an on-going effort to add more detailed row-level filters using 
> filter Predicates as part of the Reader.Options as part of 
> [ORC-577|https://issues.apache.org/jira/browse/ORC-577].
> However, there are still cases where the framework implementing the 
> TreeReader interface wants to skip particular rows to avoid expensive type 
> Decode i.e DecimalColumnVector or Decimal64ColumnVector type.
> In this ticket I propose to support extend the TreeReader abstract class with 
> an extra method next Vector method.
> {code:java}
> abstract void nextVector(ColumnVector previous,
>  boolean[] isNull, boolean[] skipRows,
>  final int batchSize){code}
> The subclasses implementing this method will be able to use the skipRows 
> method to avoid expensive decoding when needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ORC-593) Allow row level Skipping

Reply via email to