XinyuZeng commented on issue #34053: URL: https://github.com/apache/arrow/issues/34053#issuecomment-1422460551
> > Just curious, would page index optimization be added to the Arrow interface in the long term after the low level reader/writer are finished? I'd expect that also requires change the I/O unit from row group to page. > > Sounds ok, but seems it requires high performance IO-merging and requires some benchmarks/testing There is already IO coalesce, but its range unit is ColumnChunk . https://github.com/apache/arrow/blob/39bad5442c6447bf07594b09e4b29118b3211460/cpp/src/arrow/io/caching.cc#L175 Perhaps it is not necessary to breakdown the IO to page, since Parquet-format states ColumnChunk is the IO unit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
