[jira] [Updated] (ORC-593) Allow row-level Skipping

Panagiotis Garefalakis (Jira) Mon, 27 Jan 2020 10:29:26 -0800


     [ 
https://issues.apache.org/jira/browse/ORC-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Panagiotis Garefalakis updated ORC-593:
---------------------------------------
    Description: 
Currently, ORC supports filtering at: File, Stripe, and row group level.

There is an on-going effort to add more detailed row-level filters using filter 
Predicates as part of the Reader.Options as part of ORC-577.

However, there are still cases where the framework implementing the TreeReader 
interface wants to skip particular rows without using Predicates (e.g., simply 
using indexes for rows to be skipped), to avoid expensive type Decode i.e 
DecimalColumnVector or Decimal64ColumnVector type.

In this ticket I propose to support extend the TreeReader abstract class with 
an extra method next Vector method.
{code:java}
abstract void nextVector(ColumnVector previous,
 boolean[] isNull, boolean[] skipRows,
 final int batchSize){code}
The subclasses implementing this method will be able to use the (existing) 
skipRows method to avoid expensive decoding when needed given the skipRows 
array argument.

  was:
Currently, ORC supports filtering at: File, Stripe, and row group level.

There is an on-going effort to add more detailed row-level filters using filter 
Predicates as part of the Reader.Options as part of ORC-577.

However, there are still cases where the framework implementing the TreeReader 
interface wants to skip particular rows without using Predicates, to avoid 
expensive type Decode i.e DecimalColumnVector or Decimal64ColumnVector type.

In this ticket I propose to support extend the TreeReader abstract class with 
an extra method next Vector method.
{code:java}
abstract void nextVector(ColumnVector previous,
 boolean[] isNull, boolean[] skipRows,
 final int batchSize){code}
The subclasses implementing this method will be able to use the (existing) 
skipRows method to avoid expensive decoding when needed given the skipRows 
array argument.


> Allow row-level Skipping
> ------------------------
>
>                 Key: ORC-593
>                 URL: https://issues.apache.org/jira/browse/ORC-593
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Panagiotis Garefalakis
>            Priority: Major
>             Fix For: 1.5.8, master
>
>
> Currently, ORC supports filtering at: File, Stripe, and row group level.
> There is an on-going effort to add more detailed row-level filters using 
> filter Predicates as part of the Reader.Options as part of ORC-577.
> However, there are still cases where the framework implementing the 
> TreeReader interface wants to skip particular rows without using Predicates 
> (e.g., simply using indexes for rows to be skipped), to avoid expensive type 
> Decode i.e DecimalColumnVector or Decimal64ColumnVector type.
> In this ticket I propose to support extend the TreeReader abstract class with 
> an extra method next Vector method.
> {code:java}
> abstract void nextVector(ColumnVector previous,
>  boolean[] isNull, boolean[] skipRows,
>  final int batchSize){code}
> The subclasses implementing this method will be able to use the (existing) 
> skipRows method to avoid expensive decoding when needed given the skipRows 
> array argument.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ORC-593) Allow row-level Skipping

Reply via email to