rdblue commented on a change in pull request #1388:
URL: https://github.com/apache/iceberg/pull/1388#discussion_r479428214
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java
##########
@@ -690,4 +692,25 @@ private ParquetReadBuilder(org.apache.parquet.io.InputFile
file) {
return new ParquetReadSupport<>(schema, readSupport, callInit,
nameMapping);
}
}
+
+ /**
+ * @param inputFiles an {@link Iterable} of parquet files. The order of
iteration determines the order in which
+ * content of files are read and written to the @param
outputFile
+ * @param outputFile the output parquet file containing all the data from
@param inputFiles
+ * @param rowGroupSize the row group size to use when writing the @param
outputFile
+ * @param schema the schema of the data
+ * @param metadata extraMetadata to write at the footer of the @param
outputFile
+ */
+ public static void concat(Iterable<File> inputFiles, File outputFile, int
rowGroupSize, Schema schema,
Review comment:
I think the input files and output file should use `InputFile` and
`OutputFile`. That way this isn't limited to just the local FS.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]