[ 
https://issues.apache.org/jira/browse/ARROW-15587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500342#comment-17500342
 ] 

Weston Pace commented on ARROW-15587:
-------------------------------------

I think one approach could be to change the FromProto method to create a 
dataset using a dataset factory instead of a dataset.

 * The first thing we will need to do is scan the URIs and determine the 
filesystem.  If there are multiple filesystems we could just return an error 
today (in the future we could maybe create multiple datasets and union them 
together).
 * Once we have a filesystem we can extract the path part from the URIs.  I'm 
pretty sure paths can be folders or files.
 * If we want to add glob support we should add that in the 
FileSystemDatasetFactory.  That can be done in a follow-up PR if we want to 
keep things simpler.


> [C++] Add support for all options specified by 
> substrait::ReadRel::LocalFiles::FileOrFiles
> ------------------------------------------------------------------------------------------
>
>                 Key: ARROW-15587
>                 URL: https://issues.apache.org/jira/browse/ARROW-15587
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Ariana Villegas
>            Priority: Major
>              Labels: substrait
>
> The Substrait read operator defines files with LocalFiles::FileOrFiles.  
> These elements can take one of several forms:
> uri_path (can be a file or a folder)
> uri_path_glob (a glob expression)
> uri_file (file only)
> uri_folder (folder only)
> The C++ Substrait consumer currently only supports uri_file.  We should add 
> support for the other options.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to