shangxinli commented on a change in pull request #775:
URL: https://github.com/apache/parquet-mr/pull/775#discussion_r415418003
##########
File path:
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##########
@@ -919,6 +895,59 @@ public void appendRowGroup(SeekableInputStream from,
BlockMetaData rowGroup,
endBlock();
}
+ /**
+ * Merges adjacent row groups in the supplied files while maintaining that
the new groups is no more than the specified
+ * maxRowGroupSize
+ * @param inputFiles input files to merge
+ * @param maxRowGroupSize the maximum size in bytes the new created groups
can be
+ * @param useV2Writer whether to use a V2 encoding based writer when
rewriting dictionary encoded pages
+ * @param compression compression to use when writing
+ * @throws IOException
+ */
+ public void mergeRowGroups(List<InputFile> inputFiles, long maxRowGroupSize,
boolean useV2Writer, CompressionCodecName compression) throws IOException {
Review comment:
I prefer not to unless you strongly think we should.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]