Jackie-Jiang commented on a change in pull request #7193:
URL: https://github.com/apache/pinot/pull/7193#discussion_r675900136
##########
File path:
pinot-tools/src/main/java/org/apache/pinot/tools/admin/command/SegmentProcessorFrameworkCommand.java
##########
@@ -72,25 +80,42 @@ public String description() {
@Override
public boolean execute()
throws Exception {
+ PluginManager.get().init();
SegmentProcessorFrameworkSpec segmentProcessorFrameworkSpec =
JsonUtils.fileToObject(new File(_segmentProcessorFrameworkSpec),
SegmentProcessorFrameworkSpec.class);
File inputSegmentsDir = new
File(segmentProcessorFrameworkSpec.getInputSegmentsDir());
File outputSegmentsDir = new
File(segmentProcessorFrameworkSpec.getOutputSegmentsDir());
- if (!outputSegmentsDir.exists()) {
- if (!outputSegmentsDir.mkdirs()) {
- throw new RuntimeException(
- "Did not find output directory, and could not create it either: "
+ segmentProcessorFrameworkSpec
- .getOutputSegmentsDir());
+ File workingDir = new File(outputSegmentsDir, "tmp-" + UUID.randomUUID());
+ File untarredSegmentsDir = new File(workingDir, "untarred_segments");
+ FileUtils.forceMkdir(untarredSegmentsDir);
+ File[] segmentDirs = inputSegmentsDir.listFiles();
+ Preconditions
+ .checkState(segmentDirs != null && segmentDirs.length > 0, "Failed to
find files under input segments dir: %s",
+ inputSegmentsDir.getAbsolutePath());
+ List<RecordReader> recordReaders = new ArrayList<>(segmentDirs.length);
+ for (File segmentDir : segmentDirs) {
+ String fileName = segmentDir.getName();
+
+ // Untar the segments if needed
+ if (!segmentDir.isDirectory()) {
+ if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) {
+ segmentDir = TarGzCompressionUtils.untar(segmentDir,
untarredSegmentsDir).get(0);
+ } else {
+ throw new IllegalStateException("Unsupported segment format: " +
segmentDir.getAbsolutePath());
Review comment:
Good point. One workaround would be blindly untar the file assuming it
is a tar.gz file.
Ideally we should put some extra config to indicate the file type, and we
can use this command to process any data files, not limited to pinot segments.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]