alamb commented on code in PR #4908:
URL: https://github.com/apache/arrow-datafusion/pull/4908#discussion_r1089734021


##########
datafusion/core/src/dataframe.rs:
##########
@@ -62,6 +62,7 @@ use crate::prelude::SessionContext;
 /// ```
 /// # use datafusion::prelude::*;
 /// # use datafusion::error::Result;
+/// # use datafusion::execution::context::Reader;

Review Comment:
   I am somewhat worried about session context getting a new trait like this as 
now everyone who uses SessionContext must use now add this new `use` statement



##########
datafusion/core/src/execution/context.rs:
##########
@@ -613,50 +615,29 @@ impl SessionContext {
     /// [`read_table`](Self::read_table) with a [`ListingTable`].
     async fn _read_type<'a>(
         &self,
-        table_path: impl AsRef<str>,
+        table_paths: Vec<impl AsRef<str>>,
         options: impl ReadOptions<'a>,
     ) -> Result<DataFrame> {
-        let table_path = ListingTableUrl::parse(table_path)?;
+        let table_paths = table_paths
+            .iter()
+            .map(ListingTableUrl::parse)
+            .collect::<Result<Vec<ListingTableUrl>>>()?;
         let session_config = self.copied_config();
         let listing_options = options.to_listing_options(&session_config);
         let resolved_schema = match options
-            .get_resolved_schema(&session_config, self.state(), 
table_path.clone())
+            .get_resolved_schema(&session_config, self.state(), 
table_paths[0].clone())
             .await
         {
             Ok(resolved_schema) => resolved_schema,
             Err(e) => return Err(e),
         };
-        let config = ListingTableConfig::new(table_path)
+        let config = ListingTableConfig::new_with_multi_paths(table_paths)
             .with_listing_options(listing_options)
             .with_schema(resolved_schema);
         let provider = ListingTable::try_new(config)?;
         self.read_table(Arc::new(provider))
     }
 
-    /// Creates a [`DataFrame`] for reading an Avro data source.
-    ///
-    /// For more control such as reading multiple files, you can use
-    /// [`read_table`](Self::read_table) with a [`ListingTable`].
-    pub async fn read_avro(
-        &self,
-        table_path: impl AsRef<str>,
-        options: AvroReadOptions<'_>,
-    ) -> Result<DataFrame> {
-        self._read_type(table_path, options).await
-    }

Review Comment:
   I wonder if you could leave these methods on `SessionContext` and instead 
use a trait for the argument and avoid changes to downstream users. 
   
   Perhaps something like:
   
   
   ```rust
   trait DataFilePaths {
     // Parse to a list of URs
     fn to_urls(self) -> Result<Vec<ListingTableUrl>>>;
   }
   
   impl DataFilePaths for &str {
     fn to_urls(self) -> Result<Vec<ListingTableUrl>>> {
       Ok(vec![ListingTableUrl::parse(self)?])
     }
   }
   
   impl DataFilePaths for Vec<&str> {
     fn to_urls(self) -> Result<Vec<ListingTableUrl>>> {
       self
               .iter()
               .map(ListingTableUrl::parse)
               .collect::<Result<Vec<ListingTableUrl>>>()
     }
   }
   
   
   
   impl SessionContext {
   ...
       pub async fn <P> read_avro(
           &self,
           table_path: P,
           options: AvroReadOptions<'_>,
       ) -> Result<DataFrame> {
           self._read_type(table_path.to_urls()>, options).await
       }
   ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to