yjshen commented on a change in pull request #1120: URL: https://github.com/apache/arrow-datafusion/pull/1120#discussion_r730341944
########## File path: datafusion/src/physical_plan/file_format/parquet.rs ########## @@ -59,14 +60,12 @@ use tokio::{ use async_trait::async_trait; -use crate::datasource::{FilePartition, PartitionedFile}; - /// Execution plan for scanning one or more Parquet partitions #[derive(Debug, Clone)] pub struct ParquetExec { object_store: Arc<dyn ObjectStore>, - /// Parquet partitions to read - partitions: Vec<ParquetPartition>, + /// List of parquet files, grouped by output partition Review comment: "output partition" is vague here. `file_group`, i.e. `Vec<PartitionedFile>`, is the unit of parallelism and will be processed by one single executor/thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org