[
https://issues.apache.org/jira/browse/BEAM-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999640#comment-15999640
]
Devon Meunier commented on BEAM-2150:
-------------------------------------
[[email protected]] noticed that gsutil's globbing semantics don't quite
match my PR.
He noted:
{quote}
[11:12:18 dhalperi@dhalperi:beam a3cbf5905* ] gsutil ls
'gs://clouddfe-dhalperi/gcs-recursive/**/*.txt' [1]
gs://clouddfe-dhalperi/gcs-recursive/file1.txt
gs://clouddfe-dhalperi/gcs-recursive/somedir/file2.txt
[2:13]
However that same glob passed to TextIO only gets the second file.
{quote}
However, testing against a shell also seems to have different semantics:
{code}
[I] » tree glob/ ~
glob/
├── dir
│ └── file2.txt
└── file1.txt
1 directory, 2 files
[I] » ls glob/**/*.txt ~
glob/dir/file2.txt
[I] » ls glob/**.txt ~
glob/dir/file2.txt glob/file1.txt
[I] » ~
{code}
My PR matches the behaviour of a shell, so gsutil seems like the odd one out. I
think we can commit to it with more tests to make this behaviour explicit. What
do you think?
> Support for recursive wildcards in GcsPath
> ------------------------------------------
>
> Key: BEAM-2150
> URL: https://issues.apache.org/jira/browse/BEAM-2150
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core, sdk-java-gcp
> Reporter: Devon Meunier
> Assignee: Devon Meunier
> Priority: Minor
>
> When working with heavily nested folder structures in Google Cloud Storage,
> it's great to make use of recursive wildcards, which the current API
> explicitly does not support.
> This code hasn't been touched in 2 years so it's likely that simply no one's
> gotten around to it yet.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)