westonpace commented on code in PR #12625:
URL: https://github.com/apache/arrow/pull/12625#discussion_r865397995


##########
cpp/src/arrow/filesystem/filesystem_test.cc:
##########
@@ -265,6 +265,33 @@ TEST(PathUtil, ToSlashes) {
 #endif
 }
 
+TEST(PathUtil, Globber) {
+  Globber empty("");
+  ASSERT_FALSE(empty.Matches("/1.txt"));
+
+  Globber star("/*");
+  ASSERT_TRUE(star.Matches("/a.txt"));
+  ASSERT_TRUE(star.Matches("/b.csv"));
+  ASSERT_FALSE(star.Matches("/foo/c.parquet"));
+
+  Globber localfs_linux("/f?o/bar/a?/1*.txt");
+  ASSERT_TRUE(localfs_linux.Matches("/foo/bar/a1/1.txt"));
+  ASSERT_TRUE(localfs_linux.Matches("/f#o/bar/ab/1000.txt"));
+  ASSERT_FALSE(localfs_linux.Matches("/f#o/bar/ab/1/23.txt"));
+
+  Globber localfs_windows("C:/f?o/bar/a?/1*.txt");
+  ASSERT_TRUE(localfs_windows.Matches("C:/f_o/bar/ac/1000.txt"));
+
+  Globber remotefs("remote://my|bucket(#?)/foo{*}/[?]bar~/b&z/a: *-c.txt");

Review Comment:
   I hope we never have a path quite like this :laughing: 



##########
cpp/src/arrow/filesystem/util_internal.h:
##########
@@ -49,6 +49,10 @@ Status NotAFile(const std::string& path);
 ARROW_EXPORT
 Status InvalidDeleteDirContents(const std::string& path);
 
+ARROW_EXPORT
+Result<FileInfoVector> GetGlobFiles(const std::shared_ptr<FileSystem>& 
filesystem,
+                                    const std::string& glob);
+

Review Comment:
   We should add some comments here.  We should point out that all `glob` 
patterns should be absolute.  For example, I don't think we can do 
`GetGlobFiles(fs, "*.parquet")`.  This is ok for our purposes (globs are given 
via `file://` URIs and `file://` URIs must contain absolute paths) but we 
should make it clear in case someone wants to use this function for another 
purpose.



##########
cpp/src/arrow/filesystem/filesystem_test.cc:
##########
@@ -265,6 +265,33 @@ TEST(PathUtil, ToSlashes) {
 #endif
 }
 
+TEST(PathUtil, Globber) {
+  Globber empty("");
+  ASSERT_FALSE(empty.Matches("/1.txt"));
+
+  Globber star("/*");
+  ASSERT_TRUE(star.Matches("/a.txt"));
+  ASSERT_TRUE(star.Matches("/b.csv"));
+  ASSERT_FALSE(star.Matches("/foo/c.parquet"));

Review Comment:
   Can we also add a quick test that `/a?b` does not match `/a/b`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to