Abacn commented on code in PR #35582:
URL: https://github.com/apache/beam/pull/35582#discussion_r2205633815


##########
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTable.java:
##########
@@ -122,4 +127,27 @@ public IsBounded isBounded() {
   public ProjectSupport supportsProjects() {
     return ProjectSupport.WITH_FIELD_REORDERING;
   }
+
+  private String resolveFilePattern(String location) {
+    try {
+      MatchResult match = FileSystems.match(location);
+      if (match.status() == MatchResult.Status.OK && 
!match.metadata().isEmpty()) {
+        MatchResult.Metadata metadata = match.metadata().get(0);
+        if (metadata.resourceId().isDirectory()) {
+          String dirPath = metadata.resourceId().toString();
+          if (dirPath.endsWith("/")) {
+            return dirPath + "*";
+          } else {
+            return dirPath + "/*";
+          }
+        }
+      }
+    } catch (IOException e) {
+      LOG.warn(

Review Comment:
   buildIOReader happens at pipeline expansion time, and it is totally valid 
use case that submission VM not having access to the filesystem location 
(consider user submit pipeline locally and run in Dataflow).
   
   How about not rely on FileSystems.match call to resolveFilePattern? For 
example, if location has '*', consider it as a glob; if location ends with pqt, 
or .parquet, consider it as a single file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to