lvheyang commented on a change in pull request #749:
URL: https://github.com/apache/arrow-datafusion/pull/749#discussion_r671872318
##########
File path: datafusion/src/datasource/parquet.rs
##########
@@ -38,11 +38,22 @@ pub struct ParquetTable {
schema: SchemaRef,
statistics: Statistics,
max_concurrency: usize,
+ enable_pruning: bool,
}
impl ParquetTable {
/// Attempt to initialize a new `ParquetTable` from a file path.
pub fn try_new(path: impl Into<String>, max_concurrency: usize) ->
Result<Self> {
+ ParquetTable::try_new_with_pruning_config(path, max_concurrency, true)
+ }
+
+ /// Attempt to initialize a new `ParquetTable` from a file path. And
enable or
+ /// disable the parquet pruning features.
+ pub fn try_new_with_pruning_config(
Review comment:
Here I'm not sure if adding the function is a good choice.
My concern is, it is a public function, there may be many users who rely on
it. But the `enable_pruning` in the signature is somehow temporal, we don't
want it to last for a long time.
So I have another thought, replace this function with
`try_new_with_config(path: impl Into<String>,, execution_config:
ExecutionConfig)`. I think it's a better option, but it will introduce the
dependency of `execution::context` module which I think is the top-level
module. It seems a little weird.
I'm not sure if the second method is acceptable?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]