[GitHub] [parquet-mr] shangxinli commented on a change in pull request #775: PARQUET-1381: add parquet block merging feature

GitBox Sun, 26 Apr 2020 15:41:43 -0700


shangxinli commented on a change in pull request #775:
URL: https://github.com/apache/parquet-mr/pull/775#discussion_r415418003




##########
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##########
@@ -919,6 +895,59 @@ public void appendRowGroup(SeekableInputStream from, 
BlockMetaData rowGroup,
     endBlock();
   }
 
+  /**
+   * Merges adjacent row groups in the supplied files while maintaining that 
the new groups is no more than the specified
+   * maxRowGroupSize
+   * @param inputFiles input files to merge
+   * @param maxRowGroupSize the maximum size in bytes the new created groups 
can be
+   * @param useV2Writer whether to use a V2 encoding based writer when 
rewriting dictionary encoded pages
+   * @param compression compression to use when writing
+   * @throws IOException
+   */
+  public void mergeRowGroups(List<InputFile> inputFiles, long maxRowGroupSize, 
boolean useV2Writer, CompressionCodecName compression) throws IOException {

Review comment:
       I prefer not to unless you strongly think we should. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #775: PARQUET-1381: add parquet block merging feature

Reply via email to