[
https://issues.apache.org/jira/browse/ARROW-18413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gang Wu updated ARROW-18413:
----------------------------
Description:
Parquet ColumnChunk thrift object has recorded metadata for page index:
[parquet-format/parquet.thrift at master · apache/parquet-format
(github.com)|https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L799]
We just need to add public API to ColumnChunkMetaData to make it ready to read.
was:
Parquet ColumnChunk thrift object has recorded metadata for page index:
{quote}struct ColumnChunk {
/** File offset of ColumnChunk's OffsetIndex **/
4: optional i64 offset_index_offset
/** Size of ColumnChunk's OffsetIndex, in bytes **/
5: optional i32 offset_index_length
/** File offset of ColumnChunk's ColumnIndex **/
6: optional i64 column_index_offset
/** Size of ColumnChunk's ColumnIndex, in bytes **/
7: optional i32 column_index_length
}
{quote}
We just need to add public API to ColumnChunkMetaData to make it ready to read.
> [C++][Parquet] FileMetaData exposes page index metadata
> -------------------------------------------------------
>
> Key: ARROW-18413
> URL: https://issues.apache.org/jira/browse/ARROW-18413
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: C++, Parquet
> Reporter: Gang Wu
> Assignee: Gang Wu
> Priority: Major
>
> Parquet ColumnChunk thrift object has recorded metadata for page index:
> [parquet-format/parquet.thrift at master · apache/parquet-format
> (github.com)|https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L799]
> We just need to add public API to ColumnChunkMetaData to make it ready to
> read.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)