Abacn commented on code in PR #35582: URL: https://github.com/apache/beam/pull/35582#discussion_r2205633815
########## sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTable.java: ########## @@ -122,4 +127,27 @@ public IsBounded isBounded() { public ProjectSupport supportsProjects() { return ProjectSupport.WITH_FIELD_REORDERING; } + + private String resolveFilePattern(String location) { + try { + MatchResult match = FileSystems.match(location); + if (match.status() == MatchResult.Status.OK && !match.metadata().isEmpty()) { + MatchResult.Metadata metadata = match.metadata().get(0); + if (metadata.resourceId().isDirectory()) { + String dirPath = metadata.resourceId().toString(); + if (dirPath.endsWith("/")) { + return dirPath + "*"; + } else { + return dirPath + "/*"; + } + } + } + } catch (IOException e) { + LOG.warn( Review Comment: buildIOReader happens at pipeline expansion time, and it is totally valid use case that submission VM not having access to the filesystem location (consider user submit pipeline locally and run in Dataflow). How about not rely on FileSystems.match call to resolveFilePattern? For example, if location has '*', consider it as a glob; if location ends with pqt, or .parquet, consider it as a single file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org