jp0317 commented on PR #36510: URL: https://github.com/apache/arrow/pull/36510#issuecomment-1627768067
Thanks for the comments. Would allowing an optional column-specific read properties instead of optional buffer size be better? This column-specific read properties can be the `ColumnReaderProperties` as mapleFU suggested which is only about buffer size, or just the `ReadProperties` which covers [more settings](https://github.com/apache/arrow/blob/main/cpp/src/parquet/properties.h#L120C1-L126). Given that we already let users decide the global buffer size, leaving column-specific option to users might be more feasible and easier than tuning internally inside file reader. And users can still choose to stick with the old behavior (using global buffer size) if column-specific is difficult/unimportant to them. Thanks mapleFu@ for the suggestion. It does look better. It seems to introduce new API to set and store the column specific buffer size before creating column page reader (e.g., before calling `Column(int i)`), and the read properties should be aware of the target column index when creating the stream buffer. If embedding the column specific properties in the creation API as mentioned above be simpler? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
