yjshen commented on a change in pull request #1120:
URL: https://github.com/apache/arrow-datafusion/pull/1120#discussion_r730341944



##########
File path: datafusion/src/physical_plan/file_format/parquet.rs
##########
@@ -59,14 +60,12 @@ use tokio::{
 
 use async_trait::async_trait;
 
-use crate::datasource::{FilePartition, PartitionedFile};
-
 /// Execution plan for scanning one or more Parquet partitions
 #[derive(Debug, Clone)]
 pub struct ParquetExec {
     object_store: Arc<dyn ObjectStore>,
-    /// Parquet partitions to read
-    partitions: Vec<ParquetPartition>,
+    /// List of parquet files, grouped by output partition

Review comment:
       "output partition" is vague here. 
   `file_group`, i.e. `Vec<PartitionedFile>`, is the unit of parallelism and 
will be processed by one single executor/thread.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to