Jackie-Jiang commented on code in PR #9265:
URL: https://github.com/apache/pinot/pull/9265#discussion_r953119745


##########
pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/batch/spec/SegmentGenerationJobSpec.java:
##########
@@ -45,6 +45,11 @@ public class SegmentGenerationJobSpec implements 
Serializable {
    */
   private String _inputDirURI;
 
+  /**
+   * If true, search input files recursively from root directly specified in 
_inputDirURI.
+   */
+  // TODO: set the default value to false after all clients are aware of this.
+  private boolean _searchRecursively = true;

Review Comment:
   (nit) Add an empty line after this



##########
pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/batch/spec/SegmentGenerationJobSpec.java:
##########
@@ -161,6 +166,13 @@ public void setInputDirURI(String inputDirURI) {
     _inputDirURI = inputDirURI;
   }
 
+  public boolean isSearchRecursively() {
+    return _searchRecursively;
+  }
+
+  public void setSearchRecursively(boolean searchRecursively) {
+    _searchRecursively = searchRecursively;
+  }

Review Comment:
   (nit) Add an empty line



##########
pinot-common/src/main/java/org/apache/pinot/common/segment/generation/SegmentGenerationUtils.java:
##########
@@ -229,4 +233,63 @@ private static String fetchUrl(URL url, String authToken)
     }
     return IOUtils.toString(connection.getInputStream(), 
StandardCharsets.UTF_8);
   }
+
+
+  /**
+   * Find matching files from root directory specified in fileUri.
+   * If includePattern and excludePattern are not null, get all the files that 
match includePattern and exclude files
+   * that match excludePattern.
+   * If
+   *
+   * @param pinotFs root directly fs
+   * @param fileUri root directly uri
+   * @param includePattern optional glob patterns for files to include
+   * @param excludePattern optional glob patterns for files to exclude
+   * @param searchRecrusively if ture, search files recursively from directory 
specified in fileUri
+   * @return list of matching files.
+   * @throws IOException on IO failure for list files in root directory.
+   * @throws URISyntaxException for matching file URIs
+   * @throws RuntimeException if there is no matching file.
+   */
+  public static List<String> listMatchedFilesWithRecursiveOption(PinotFS 
pinotFs, URI fileUri, String includePattern,

Review Comment:
   Annotate `includePattern` and `excludePattern` as `Nullable`



##########
pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/batch/BatchConfig.java:
##########
@@ -32,6 +32,7 @@ public class BatchConfig {
 
   private final FileFormat _inputFormat;
   private final String _inputDirURI;
+  private final boolean _searchRecursively;

Review Comment:
   Is this used? If so, the default for it should also be `true`



##########
pinot-common/src/main/java/org/apache/pinot/common/segment/generation/SegmentGenerationUtils.java:
##########
@@ -229,4 +233,63 @@ private static String fetchUrl(URL url, String authToken)
     }
     return IOUtils.toString(connection.getInputStream(), 
StandardCharsets.UTF_8);
   }
+
+
+  /**
+   * Find matching files from root directory specified in fileUri.
+   * If includePattern and excludePattern are not null, get all the files that 
match includePattern and exclude files
+   * that match excludePattern.
+   * If
+   *
+   * @param pinotFs root directly fs

Review Comment:
   (minor) You mean `directory` here? Same for `fileUri`



##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-hadoop/src/main/java/org/apache/pinot/plugin/ingestion/batch/hadoop/HadoopSegmentGenerationJobRunner.java:
##########
@@ -143,13 +141,6 @@ public void run()
       PinotFSFactory.register(pinotFSSpec.getScheme(), 
pinotFSSpec.getClassName(), new PinotConfiguration(pinotFSSpec));
     }
 
-    //Get pinotFS for input

Review Comment:
   Let's not move this. We want to throw exception before creating output dir 
when the input is not valid. We may move the list files logic here



##########
pinot-common/src/main/java/org/apache/pinot/common/segment/generation/SegmentGenerationUtils.java:
##########
@@ -229,4 +233,63 @@ private static String fetchUrl(URL url, String authToken)
     }
     return IOUtils.toString(connection.getInputStream(), 
StandardCharsets.UTF_8);
   }
+
+
+  /**
+   * Find matching files from root directory specified in fileUri.
+   * If includePattern and excludePattern are not null, get all the files that 
match includePattern and exclude files
+   * that match excludePattern.
+   * If

Review Comment:
   (nit) Remove



##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java:
##########
@@ -128,14 +126,6 @@ public void run()
     for (PinotFSSpec pinotFSSpec : pinotFSSpecs) {
       PinotFSFactory.register(pinotFSSpec.getScheme(), 
pinotFSSpec.getClassName(), new PinotConfiguration(pinotFSSpec));
     }
-
-    //Get pinotFS for input

Review Comment:
   Same here. Let's first gather the input files before processing the outputDir



##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark3/SparkSegmentGenerationJobRunner.java:
##########
@@ -128,14 +126,6 @@ public void run()
     for (PinotFSSpec pinotFSSpec : pinotFSSpecs) {
       PinotFSFactory.register(pinotFSSpec.getScheme(), 
pinotFSSpec.getClassName(), new PinotConfiguration(pinotFSSpec));
     }
-
-    //Get pinotFS for input

Review Comment:
   Same here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to