lvheyang commented on a change in pull request #749:
URL: https://github.com/apache/arrow-datafusion/pull/749#discussion_r671872318
##########
File path: datafusion/src/datasource/parquet.rs
##########
@@ -38,11 +38,22 @@ pub struct ParquetTable {
schema: SchemaRef,
statistics: Statistics,
max_concurrency: usize,
+ enable_pruning: bool,
}
impl ParquetTable {
/// Attempt to initialize a new `ParquetTable` from a file path.
pub fn try_new(path: impl Into<String>, max_concurrency: usize) ->
Result<Self> {
+ ParquetTable::try_new_with_pruning_config(path, max_concurrency, true)
+ }
+
+ /// Attempt to initialize a new `ParquetTable` from a file path. And
enable or
+ /// disable the parquet pruning features.
+ pub fn try_new_with_pruning_config(
Review comment:
Here I'm not sure if adding the function is a good choice.
My concern is, it is a public function, it may last a long time, and many
users will rely on it. But the `enable_pruning` in the signature is somehow
temporal, we don't want it to last for a long time.
So I have another thought, replace this function with
`try_new_with_config(path: impl Into<String>,, execution_config:
ExecutionConfig)`. I think it's a better option, but it will introduce the
dependency of `execution::context` module which I think is the top-level
module. It seems a little weird.
Or we can add a function like `try_new_with_config(path: impl Into<String>,
conf: ParquetConfig)` with struct `ParquetConfig`. I think it's should be more
flexible, the interface would not change if we want to abandon this option.
If there are no other considerations, I will fix it by the 3rd method
tomorrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]