[
https://issues.apache.org/jira/browse/ARROW-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459509#comment-17459509
]
Carlos O'Ryan commented on ARROW-14924:
---------------------------------------
I am having problems getting these tests to pass. For example, consider this
section of the `TestCreateDir()` test:
https://github.com/apache/arrow/blob/3e666f6691495a8bed86e519d71c2cc22cf0c03d/cpp/src/arrow/filesystem/test_util.cc#L223-L226
Recall that GCS does not have directories, it has a flat namespace. We have
been using files with a trailing {{/}} character as sentinels to indicate that
a directory exists. Because GCS has a flat namespace an object called {{foo}}
is distinct from an object called {{foo/}} (or {{foo///}} for that matter).
Therefore, it is possible to have a "file" called {{AB/def}} and a "directory"
called {{AB/def/}} at the same time. The test fails because creating the
directory sentinel {{AB/def/}} succeeds, even though the file {{AB/def}}
already exists.
I think there are several options, but I am not sure which one is more
desirable:
1. We could try to use {{AB/def}} as the directory sentinel. I think that
would make this test pass, but has some downsides: other places in the code
would need to determine if an object without a trailing {{/}} is really a
directory. That we could do using metadata attributes. However, using metadata
attributes would fail when using directory hierarchies created by other tools
(say the GCS UI at https://console.cloud.google.com). In my opinion, this is a
minor downside because any set of files created externally may be missing the
directory sentinels altogether.
1. Alternatively, when recursively creating a directory we could verify that
each component does not have a "file" with the same name as the directory
marker. This seems expensive, it requires checking two different paths, not to
mention the race conditions.
Maybe there are other alternatives that I do not see at this moment. At the
moment I am inclined to use the first alternative.
> [C++] Implement generic filesystem tests for GCS
> ------------------------------------------------
>
> Key: ARROW-14924
> URL: https://issues.apache.org/jira/browse/ARROW-14924
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Antoine Pitrou
> Assignee: Carlos O'Ryan
> Priority: Major
>
> Once the GCS filesystem implementation is functionally complete, generic
> tests should be enabled for it to catch corner cases. See for example the S3
> filesystem tests:
> https://github.com/apache/arrow/blob/3e666f6691495a8bed86e519d71c2cc22cf0c03d/cpp/src/arrow/filesystem/s3fs_test.cc#L1039-L1081
--
This message was sent by Atlassian Jira
(v8.20.1#820001)