yjshen commented on a change in pull request #811:
URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r688304741



##########
File path: datafusion/src/datasource/mod.rs
##########
@@ -36,3 +47,231 @@ pub(crate) enum Source<R = Box<dyn std::io::Read + Send + 
Sync + 'static>> {
     /// Read data from a reader
     Reader(std::sync::Mutex<Option<R>>),
 }
+
+#[derive(Debug, Clone)]
+/// A single file that should be read, along with its schema, statistics

Review comment:
       Yes, that's the intention here.
   - `PartitionedFile` -> Single file (for the moment) or part of a file 
(later, part of the row groups or rows), and we may even extend this to include 
partition value and partition schema (see below) to support partitioned tables:
   `/path/to/table/root/p_date=20210813/p_hour=1200/xxxxx.parquet`
   - `FilePartition` -> The basic unit for parallel processing, each task is 
responsible for processing one `FilePartition` which is composed of several 
`PartitionFile`s.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to