[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1120: Simplify file struct abstractions

GitBox Sat, 16 Oct 2021 19:34:52 -0700


yjshen commented on a change in pull request #1120:
URL: https://github.com/apache/arrow-datafusion/pull/1120#discussion_r730341944




##########
File path: datafusion/src/physical_plan/file_format/parquet.rs
##########
@@ -59,14 +60,12 @@ use tokio::{
 
 use async_trait::async_trait;
 
-use crate::datasource::{FilePartition, PartitionedFile};
-
 /// Execution plan for scanning one or more Parquet partitions
 #[derive(Debug, Clone)]
 pub struct ParquetExec {
     object_store: Arc<dyn ObjectStore>,
-    /// Parquet partitions to read
-    partitions: Vec<ParquetPartition>,
+    /// List of parquet files, grouped by output partition

Review comment:
       "output partition" is vague here. 
   `file_group`, i.e. `Vec<PartitionedFile>`, is the unit of parallelism and 
will be processed by one single executor/thread.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1120: Simplify file struct abstractions

Reply via email to