[GitHub] [arrow] wgtmac commented on a diff in pull request #36510: PARQUET-2321: [C++] allow customized buffer size when creating ArrowInputStream for a column PageReader

via GitHub Tue, 18 Jul 2023 09:08:14 -0700


wgtmac commented on code in PR #36510:
URL: https://github.com/apache/arrow/pull/36510#discussion_r1267003756



##########
cpp/src/parquet/file_reader.h:
##########
@@ -44,7 +44,8 @@ class PARQUET_EXPORT RowGroupReader {
   // An implementation of the Contents class is defined in the .cc file
   struct Contents {
     virtual ~Contents() {}
-    virtual std::unique_ptr<PageReader> GetColumnPageReader(int i) = 0;
+    virtual std::unique_ptr<PageReader> GetColumnPageReader(

Review Comment:
   The Prebuffer() only requires to read dictionary and data pages. It does not 
need to read ColumnMetaData and cannot read any customized index page. Based on 
that assumption, the current implementation can satisfy its need. However, if 
we expose the logic to external user, it is not easy to make it clear what is 
the total range of a column chunk (even the specs does not provide a good 
approach to get it right). I know you need that information to provide hints to 
estimate the read boundary of a column chunk. You can simply copy the code into 
your own business logic. It would be way easier to maintain on your own. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wgtmac commented on a diff in pull request #36510: PARQUET-2321: [C++] allow customized buffer size when creating ArrowInputStream for a column PageReader

Reply via email to