rdblue commented on a change in pull request #1388:
URL: https://github.com/apache/iceberg/pull/1388#discussion_r479427662
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java
##########
@@ -690,4 +692,25 @@ private ParquetReadBuilder(org.apache.parquet.io.InputFile
file) {
return new ParquetReadSupport<>(schema, readSupport, callInit,
nameMapping);
}
}
+
+ /**
+ * @param inputFiles an {@link Iterable} of parquet files. The order of
iteration determines the order in which
+ * content of files are read and written to the @param
outputFile
+ * @param outputFile the output parquet file containing all the data from
@param inputFiles
+ * @param rowGroupSize the row group size to use when writing the @param
outputFile
+ * @param schema the schema of the data
+ * @param metadata extraMetadata to write at the footer of the @param
outputFile
+ */
+ public static void concat(Iterable<File> inputFiles, File outputFile, int
rowGroupSize, Schema schema,
+ Map<String, String> metadata) throws IOException {
+ OutputFile file = Files.localOutput(outputFile);
+ ParquetFileWriter writer = new ParquetFileWriter(
+ ParquetIO.file(file), ParquetSchemaUtil.convert(schema, "table"),
+ ParquetFileWriter.Mode.CREATE, rowGroupSize, 0);
Review comment:
We can use the default row group size from table properties here. It
will be ignored when appending files because row groups are appended directly
and not rewritten.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]