[ 
https://issues.apache.org/jira/browse/ARROW-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422971#comment-17422971
 ] 

Weston Pace commented on ARROW-1231:
------------------------------------

For directory handling the closest equivalent we have to `stat(2)` is FileInfo 
and all that we require to be set there is the name of the file.  I imagine a 
common-prefix implementation would run into trouble with something like 
CreateDir (which would be a no-op) followed by GetFileInfo on the dir name 
(which would fail).  For S3 in this case we create an empty object.  If GCS 
supports empty objects then a common prefix + empty objects pattern might be 
pretty much the same as what we have for S3.

Non-recursive listing is, unfortunately, a possibility.  This would be a 
FIleSelector with recursive set to false.  I don't think we ever do it 
ourselves but we do allow users to use any arbitrary FileSelector when defining 
a dataset.  So a user could ask us to read all of the files from the /foo 
directory non-recursively as a dataset.  I think though that non-recursive 
directory listing is probably a rarity for dataset implementations.  An 
inefficient implementation would be a fine starting point (and likely would be 
fine for quite a while).

The most common "read dataset" directory operations are "read a directory 
recursively" (using FileSelector from the user).

For "write dataset" we have to "read a directory recursively" (via FileSelector 
with recursive true), "delete a directory recursively" (via DeleteDirContents), 
and "create a directory" (but a no-op here is fine as far as datasets is 
concerned).



> [C++] Add filesystem / IO implementation for Google Cloud Storage
> -----------------------------------------------------------------
>
>                 Key: ARROW-1231
>                 URL: https://issues.apache.org/jira/browse/ARROW-1231
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Carlos O'Ryan
>            Priority: Major
>              Labels: filesystem
>
> See example jumping off point
> https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/platform/cloud



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to