[
https://issues.apache.org/jira/browse/PARQUET-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Chen updated PARQUET-333:
------------------------------
Summary: [Vectorized Reader] Add attributes in ColumnVector and RowBatch
(was: [Vectorizaed Reader] Add attributes in ColumnVector and RowBatch)
> [Vectorized Reader] Add attributes in ColumnVector and RowBatch
> ---------------------------------------------------------------
>
> Key: PARQUET-333
> URL: https://issues.apache.org/jira/browse/PARQUET-333
> Project: Parquet
> Issue Type: Sub-task
> Components: parquet-mr
> Reporter: Dong Chen
>
> As discussed in HIVE-8128, we want to add some attributes in vector.
> * In {{ColumnVector}}, add two attributes: one is {{boolean noNulls}}, which
> indicates whether the whole column vector has no null value. The other is
> {{boolean isRepeating}}, which indicates whether the same value repeats for
> whole column vector. They could be calculated at the same time when we read a
> vector. SQL engines (like Hive) can check these attribute to skip some
> values.
> * In {{RowBatch}}, add one attribute {{int size}}, which indicates the number
> of rows in this batch. This is just for easy usage. Its value should be the
> same as {{RowBatch.columns\[0\].numValues}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)