Jackie-Jiang commented on a change in pull request #7193:
URL: https://github.com/apache/pinot/pull/7193#discussion_r675900136



##########
File path: 
pinot-tools/src/main/java/org/apache/pinot/tools/admin/command/SegmentProcessorFrameworkCommand.java
##########
@@ -72,25 +80,42 @@ public String description() {
   @Override
   public boolean execute()
       throws Exception {
+    PluginManager.get().init();
 
     SegmentProcessorFrameworkSpec segmentProcessorFrameworkSpec =
         JsonUtils.fileToObject(new File(_segmentProcessorFrameworkSpec), 
SegmentProcessorFrameworkSpec.class);
 
     File inputSegmentsDir = new 
File(segmentProcessorFrameworkSpec.getInputSegmentsDir());
     File outputSegmentsDir = new 
File(segmentProcessorFrameworkSpec.getOutputSegmentsDir());
-    if (!outputSegmentsDir.exists()) {
-      if (!outputSegmentsDir.mkdirs()) {
-        throw new RuntimeException(
-            "Did not find output directory, and could not create it either: " 
+ segmentProcessorFrameworkSpec
-                .getOutputSegmentsDir());
+    File workingDir = new File(outputSegmentsDir, "tmp-" + UUID.randomUUID());
+    File untarredSegmentsDir = new File(workingDir, "untarred_segments");
+    FileUtils.forceMkdir(untarredSegmentsDir);
+    File[] segmentDirs = inputSegmentsDir.listFiles();
+    Preconditions
+        .checkState(segmentDirs != null && segmentDirs.length > 0, "Failed to 
find files under input segments dir: %s",
+            inputSegmentsDir.getAbsolutePath());
+    List<RecordReader> recordReaders = new ArrayList<>(segmentDirs.length);
+    for (File segmentDir : segmentDirs) {
+      String fileName = segmentDir.getName();
+
+      // Untar the segments if needed
+      if (!segmentDir.isDirectory()) {
+        if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) {
+          segmentDir = TarGzCompressionUtils.untar(segmentDir, 
untarredSegmentsDir).get(0);
+        } else {
+          throw new IllegalStateException("Unsupported segment format: " + 
segmentDir.getAbsolutePath());

Review comment:
       Good point. One workaround would be blindly untar the file assuming it 
is a tar.gz file.
   Ideally we should put some extra config to indicate the file type, and we 
can use this command to process any data files, not limited to pinot segments.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to