[ 
https://issues.apache.org/jira/browse/ARROW-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459509#comment-17459509
 ] 

Carlos O'Ryan commented on ARROW-14924:
---------------------------------------

I am having problems getting these tests to pass.  For example, consider this 
section of the `TestCreateDir()` test:

https://github.com/apache/arrow/blob/3e666f6691495a8bed86e519d71c2cc22cf0c03d/cpp/src/arrow/filesystem/test_util.cc#L223-L226

Recall that GCS does not have directories, it has a flat namespace.  We have 
been using files with a trailing {{/}} character as sentinels to indicate that 
a directory exists.  Because GCS has a flat namespace an object called {{foo}} 
is distinct from an object called {{foo/}} (or {{foo///}} for that matter).  
Therefore, it is possible to have a "file" called {{AB/def}} and a "directory" 
called {{AB/def/}} at the same time.  The test fails because creating the 
directory sentinel {{AB/def/}} succeeds, even though the file {{AB/def}} 
already exists.

I think there are several options, but I am not sure which one is more 
desirable:

1. We could try to use {{AB/def}} as the directory sentinel.  I think that 
would make this test pass, but has some downsides: other places in the code 
would need to determine if an object without a trailing {{/}} is really a 
directory.  That we could do using metadata attributes. However, using metadata 
attributes would fail when using directory hierarchies created by other tools 
(say the GCS UI at https://console.cloud.google.com). In my opinion, this is a 
minor downside because any set of files created externally may be missing the 
directory sentinels altogether.
1. Alternatively, when recursively creating a directory we could verify that 
each component does not have a "file" with the same name as the directory 
marker.  This seems expensive, it requires checking two different paths, not to 
mention the race conditions.

Maybe there are other alternatives that I do not see at this moment.  At the 
moment I am inclined to use the first alternative.



> [C++] Implement generic filesystem tests for GCS
> ------------------------------------------------
>
>                 Key: ARROW-14924
>                 URL: https://issues.apache.org/jira/browse/ARROW-14924
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Antoine Pitrou
>            Assignee: Carlos O'Ryan
>            Priority: Major
>
> Once the GCS filesystem implementation is functionally complete, generic 
> tests should be enabled for it to catch corner cases. See for example the S3 
> filesystem tests:
> https://github.com/apache/arrow/blob/3e666f6691495a8bed86e519d71c2cc22cf0c03d/cpp/src/arrow/filesystem/s3fs_test.cc#L1039-L1081



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to