heymind commented on a change in pull request #10066:
URL: https://github.com/apache/arrow/pull/10066#discussion_r615542436
##########
File path: rust/datafusion/src/physical_plan/csv.rs
##########
@@ -106,13 +107,71 @@ impl<'a> CsvReadOptions<'a> {
}
}
+/// SourceReader represents where the data comes from.
+enum SourceReader {
+ /// The data comes from partitioned files
+ PartitionedFiles {
+ /// Path to directory containing partitioned files with the same schema
+ path: String,
+ /// The individual files under path
+ filenames: Vec<String>,
+ },
+
+ /// The data comes from anything impl Read trait
+ Reader(Mutex<Option<Box<dyn Read + Send + Sync + 'static>>>),
+}
+
+impl std::fmt::Debug for SourceReader {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ match self {
+ SourceReader::PartitionedFiles { path, filenames } => f
+ .debug_struct("PartitionedFiles")
+ .field("path", path)
+ .field("filenames", filenames)
+ .finish()?,
+ SourceReader::Reader(_) => f.write_str("Reader")?,
+ };
+ Ok(())
+ }
+}
+
+impl Clone for SourceReader {
+ fn clone(&self) -> Self {
+ match self {
+ SourceReader::PartitionedFiles { path, filenames } => {
+ Self::PartitionedFiles {
+ path: path.clone(),
+ filenames: filenames.clone(),
+ }
+ }
+ SourceReader::Reader(_) => Self::Reader(Mutex::new(None)),
Review comment:
I agree it's unnecessary for `CsvExec` to be `Clone`. But if we remove
the `Clone` derivation, will it introduce a breaking change ?
If the `CsvExec` is built from source files ( not from a reader ) , the
`Clone` will act as expected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]