jp0317 commented on PR #36510:
URL: https://github.com/apache/arrow/pull/36510#issuecomment-1627768067

   Thanks for the comments. Would allowing an optional column-specific read 
properties instead of optional buffer size be better? This column-specific read 
properties can be the `ColumnReaderProperties` as mapleFU suggested which is 
only about buffer size, or just the `ReadProperties` which covers [more 
settings](https://github.com/apache/arrow/blob/main/cpp/src/parquet/properties.h#L120C1-L126).
  Given that we already let users decide the global buffer size, leaving 
column-specific option to users might be more feasible and easier than tuning 
internally inside file reader. And users can still choose to stick with the old 
behavior (using global buffer size) if column-specific is difficult/unimportant 
to them. 
   
   Thanks mapleFu@ for the suggestion. It does look better.  It seems to 
introduce new API to set and store the column specific buffer size before 
creating column page reader (e.g., before calling `Column(int i)`), and the 
read properties should be aware of the target column index when creating the 
stream buffer.  If embedding the column specific properties in the creation API 
as mentioned above be simpler? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to