mapleFU commented on issue #40981: URL: https://github.com/apache/arrow/issues/40981#issuecomment-2038130354
Aha, no. Whether we have page index: "a", "b", "c", and "d" are stored in different pages. There are three levels: * File ( a whole parquet file with same schema containing zero or multiple rowgroups) * RowGroup: some "rows" with schema in file. * Column Chunk: a "leaf" column in one row-group. Each Column Chunk. In your json there're 4 column chunks * Page: some values in Column Chunk. It should have specific column. When Page Index is not enabled and the record is nested, some legacy file might have "cross page row", e.g.: "list: [[1, 1, 1], [1]]" stores "[1, 1" and "1]" in different page. But when page index enabled, it would not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
