dongjoon-hyun commented on code in PR #2601:
URL: https://github.com/apache/orc/pull/2601#discussion_r3119734536
##########
java/tools/src/java/org/apache/orc/tools/MergeFiles.java:
##########
@@ -100,11 +137,88 @@ public static void main(Configuration conf, String[]
args) throws Exception {
}
}
+ /**
+ * Multi-output behavior when --maxSize is set.
+ * Input files are grouped by cumulative raw file size; each group is merged
into
+ * a separate part file (part-00000.orc, part-00001.orc, ...) under
outputDir.
+ * A single file whose size already exceeds maxSizeBytes is placed in its
own part.
+ */
+ private static void mergeIntoMultipleFiles(Configuration conf,
+ OrcFile.WriterOptions
writerOptions,
+ List<LocatedFileStatus>
inputStatuses,
+ List<Path> inputFiles,
+ Path outputDir,
+ long maxSizeBytes) throws
Exception {
+ FileSystem outFs = outputDir.getFileSystem(conf);
+ if (outFs.exists(outputDir)) {
+ if (!outFs.getFileStatus(outputDir).isDirectory()) {
+ throw new IllegalArgumentException(
+ "Output path already exists and is not a directory: " + outputDir);
+ }
+ if (outFs.listStatus(outputDir).length > 0) {
+ throw new IllegalArgumentException(
+ "Output directory must be empty for multi-file merge: " +
outputDir);
Review Comment:
In this case, shall we behave as an **overwrite** for a better UX? I guess
you can delete the directory recursively.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]