andygrove commented on a change in pull request #8992:
URL: https://github.com/apache/arrow/pull/8992#discussion_r548049010



##########
File path: rust/datafusion/src/physical_plan/parquet.rs
##########
@@ -67,14 +67,35 @@ impl ParquetExec {
         if filenames.is_empty() {
             Err(DataFusionError::Plan("No files found".to_string()))
         } else {
+            // Calculate statistics for the entire data set. Later, we will 
probably want to make
+            // statistics available on a per-partition basis.
+            let mut num_rows = 0;
+            let mut total_byte_size = 0;
+            for file in &filenames {
+                let file = File::open(file)?;
+                let file_reader = Arc::new(SerializedFileReader::new(file)?);

Review comment:
       I've gone a little further and introduced a `ParquetPartition` struct to 
make things more explicit about how partitioning works and added references to 
related issues for changing the partitioning strategy. I also improved an error 
message and added more documentation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to