[
https://issues.apache.org/jira/browse/ARROW-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422971#comment-17422971
]
Weston Pace commented on ARROW-1231:
------------------------------------
For directory handling the closest equivalent we have to `stat(2)` is FileInfo
and all that we require to be set there is the name of the file. I imagine a
common-prefix implementation would run into trouble with something like
CreateDir (which would be a no-op) followed by GetFileInfo on the dir name
(which would fail). For S3 in this case we create an empty object. If GCS
supports empty objects then a common prefix + empty objects pattern might be
pretty much the same as what we have for S3.
Non-recursive listing is, unfortunately, a possibility. This would be a
FIleSelector with recursive set to false. I don't think we ever do it
ourselves but we do allow users to use any arbitrary FileSelector when defining
a dataset. So a user could ask us to read all of the files from the /foo
directory non-recursively as a dataset. I think though that non-recursive
directory listing is probably a rarity for dataset implementations. An
inefficient implementation would be a fine starting point (and likely would be
fine for quite a while).
The most common "read dataset" directory operations are "read a directory
recursively" (using FileSelector from the user).
For "write dataset" we have to "read a directory recursively" (via FileSelector
with recursive true), "delete a directory recursively" (via DeleteDirContents),
and "create a directory" (but a no-op here is fine as far as datasets is
concerned).
> [C++] Add filesystem / IO implementation for Google Cloud Storage
> -----------------------------------------------------------------
>
> Key: ARROW-1231
> URL: https://issues.apache.org/jira/browse/ARROW-1231
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Assignee: Carlos O'Ryan
> Priority: Major
> Labels: filesystem
>
> See example jumping off point
> https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/platform/cloud
--
This message was sent by Atlassian Jira
(v8.3.4#803005)