walterddr commented on code in PR #9245:
URL: https://github.com/apache/pinot/pull/9245#discussion_r949679994
##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java:
##########
@@ -172,7 +172,7 @@ public void init(SegmentGenerationJobSpec spec) {
public void run()
throws Exception {
// Get list of files to process.
- String[] files = _inputDirFS.listFiles(_inputDirURI, true);
+ String[] files = _inputDirFS.listFilesWithPattern(_inputDirURI,
_spec.getIncludeFileNamePattern());
Review Comment:
let's also refactor
https://github.com/apache/pinot/pull/9245/files#diff-7bec8ff885254fee67d4ad50c4f06a7894976700df3fd2ef42af6532fedfea26L181-L203
into this function.
the name `listFilesWithPattern` suggest it should already finish the glob
searching
##########
pinot-spi/src/main/java/org/apache/pinot/spi/filesystem/PinotFSFactory.java:
##########
@@ -37,6 +37,7 @@ private PinotFSFactory() {
}
public static final String LOCAL_PINOT_FS_SCHEME = "file";
+ public static final String RECURSIVE_BLOB_PREFIX = "glob:**";
Review Comment:
recursive hint from glob pattern should do recursive search whenever there's
a `**` pattern included, b/c `**` can occur in the middle such as
`subFolder/foo/**/bar`
##########
pinot-spi/src/main/java/org/apache/pinot/spi/filesystem/PinotFS.java:
##########
@@ -144,6 +143,22 @@ boolean exists(URI fileUri)
long length(URI fileUri)
throws IOException;
+ /**
+ * Lists all the files and directories at the location provided.
+ * Lists recursively if {@code inputFilePattern} starts with "glob:**".
+ * Throws IOException if this abstract pathname is not valid, or if an I/O
error occurs.
+ * @param fileUri location of file
+ * @return an array of strings that contains file paths
+ * @throws IOException on IO failure. See specific implementation
+ */
+ default String[] listFilesWithPattern(URI fileUri, String inputFilePattern)
Review Comment:
let's not add the default implementation in `PinotFS` since this is a very
generic interface.
I would suggest let's add this as a static method in
`SegmentGenerationJobSpec`
```
public static String[] listMatchedFiles(PinotFS pinotFs, URI fileUri,
String includePattern, String excludePattern) {
// ...
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]