Weijun-H commented on PR #9129: URL: https://github.com/apache/arrow-datafusion/pull/9129#issuecomment-1938067413
> Thank you @Weijun-H -- this looks like a great start. I really appreciate you working on this issue > > I poked around and I also found the following code that does something similar (converts parquet statistics into Arrays) but that is used for Row Group Pruning: > > [`6c41090`/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs#L58-L57](https://github.com/apache/arrow-datafusion/blob/6c4109017edfe10800e0ffee8c1c254aade05849/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs#L58-L57) > > Given I am quite confident in how that code works and it has had multiple contributors, I wonder would you be willing to consider refactoring the parquet statistics extraction code so that it all goes through a single path? > > This would look something like making `summarize_min_max` call `get_statistic!` > > I think you could avoid a non trivial amount of new code. Yes, I also consider refactoring the code to avoid code duplication. But in `summarize_min_max`, the Accumulator needs to update_batch, which will increase the number of times in the `match` statement. ```rust fn summarize_min_max{ match stat { ParquetStatistics::Boolean => { let value = get_statistic!(); // need to match target_arrow_type again } } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
