wgtmac commented on code in PR #36510:
URL: https://github.com/apache/arrow/pull/36510#discussion_r1257275770


##########
cpp/src/parquet/properties.h:
##########
@@ -64,7 +64,8 @@ class PARQUET_EXPORT ReaderProperties {
   MemoryPool* memory_pool() const { return pool_; }
 
   std::shared_ptr<ArrowInputStream> GetStream(std::shared_ptr<ArrowInputFile> 
source,
-                                              int64_t start, int64_t 
num_bytes);
+                                              int64_t start, int64_t num_bytes,
+                                              int64_t buffer_size = -1);

Review Comment:
   What about `std::optional<int64_t> buffer_size`?



##########
cpp/src/parquet/file_reader.cc:
##########
@@ -66,7 +66,7 @@ static constexpr int64_t kMaxDictHeaderSize = 100;
 RowGroupReader::RowGroupReader(std::unique_ptr<Contents> contents)
     : contents_(std::move(contents)) {}
 
-std::shared_ptr<ColumnReader> RowGroupReader::Column(int i) {
+std::shared_ptr<ColumnReader> RowGroupReader::Column(int i, int64_t 
buffer_size) {

Review Comment:
   TBH, this additional parameter looks a little bit weird here.



##########
cpp/src/parquet/file_reader.h:
##########
@@ -189,6 +190,9 @@ class PARQUET_EXPORT ParquetFileReader {
   ::arrow::Future<> WhenBuffered(const std::vector<int>& row_groups,
                                  const std::vector<int>& column_indices) const;
 
+  /// Return the range of the specified column chunk.
+  ::arrow::io::ReadRange GetColumnChunkRange(int row_group_index, int 
column_index);

Review Comment:
   This is not used any where?



##########
cpp/src/parquet/file_reader.h:
##########
@@ -44,7 +44,8 @@ class PARQUET_EXPORT RowGroupReader {
   // An implementation of the Contents class is defined in the .cc file
   struct Contents {
     virtual ~Contents() {}
-    virtual std::unique_ptr<PageReader> GetColumnPageReader(int i) = 0;
+    virtual std::unique_ptr<PageReader> GetColumnPageReader(int i,

Review Comment:
   I didn't see how the file reader deal with this new parameter. Is it 
intended for the caller to pass a good buffer_size?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to