[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

GitBox Thu, 19 Jan 2023 06:42:03 -0800


alamb commented on PR #4908:
URL: 
https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1397083616


   > What do you think about having a single method which only takes a list of 
paths? For a single path, the callee can create a slice/Vec. This would be a 
lot simpler to do.
   
   I was thinking about this PR and I have an alternate suggestion
   
   It seems to me that  `read_parquet`, `read_avro`, etc are wrappers to 
simplify the process of creating a `ListingTable`. Support for multiple paths 
starts complicating the API more -- what do you think about instead of adding 
`read_parquet_from_path`s we make it easier to see how to read multiple files 
using the `ListingTable` API directly?
   
   For example, I bet if we added a doc example like the following
   
   ```rust
       /// Creates a [`DataFrame`] for reading a Parquet data source from a 
single file or directory. 
       ///
       /// Note: if you want to read from multiple files, or control other 
behaviors
       /// you can use the [`ListingTable`] API directly. For example to read 
multiple files
       /// 
       /// ```
       /// Example here (basically copy/paste the implementation of 
read_parquet and support multiple files)
       /// ```
       pub async fn read_parquet(
           &self,
           table_path: impl AsRef<str>,
           options: ParquetReadOptions<'_>,
       ) -> Result<DataFrame> {
   ...
   ```
   
   We could give similar treatment to the docstrings for `read_avro` and 
`read_csv` (perhaps by pointing to the docs for `read_parquet` for an example 
of creating `ListingTable`s)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

Reply via email to