Jackie-Jiang commented on a change in pull request #4742: Adding bootstrap mode
for Pinot-hadoop job to output segments into relative directories.
URL: https://github.com/apache/incubator-pinot/pull/4742#discussion_r338705734
##########
File path:
pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mappers/SegmentCreationMapper.java
##########
@@ -259,6 +271,24 @@ protected void map(LongWritable key, Text value, Context
context)
sequenceId);
}
+ /**
+ * Generate an output directory path for bootstrap mode.
+ * This method will compute the relative path based on `inputFile` and
`baseInputDir`,
+ * then apply only the directory part of relative path to `outputDir`.
+ * E.g.
+ * baseInputDir = "/path/to/input"
+ * inputFile = "/path/to/input/a/b/c/d.avro"
+ * outputDir = "/path/to/output"
+ * getBootstrapOutputPath(baseInputDir, inputFile, outputDir) =
/path/to/output/a/b/c
+ */
+ protected static Path getBootstrapOutputPath(URI baseInputDir, URI
inputFile, Path outputDir) {
+ URI relativePath = baseInputDir.relativize(inputFile);
+ if (relativePath.getPath().length() > 0) {
+ return new Path(outputDir, relativePath.getPath()).getParent();
+ }
+ return null;
Review comment:
In what scenario can this return null? Can it happen in normal case? Should
throw exception instead IMO
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]