wgtmac commented on code in PR #3556:
URL: https://github.com/apache/parquet-java/pull/3556#discussion_r3360665791


##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -57,16 +70,16 @@ public static <I extends ValuesWriter & RequiresFallback, F 
extends ValuesWriter
    */
   private long rawDataByteSize = 0;
 
-  /**
-   * indicates if this is the first page being processed
-   */
-  private boolean firstPage = true;
-
   public FallbackValuesWriter(I initialWriter, F fallBackWriter) {
+    this(initialWriter, fallBackWriter, 0);

Review Comment:
   ```suggestion
       this(initialWriter, fallBackWriter, /*checkAfterBytes=*/0);
   ```



##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -44,6 +49,14 @@ public static <I extends ValuesWriter & RequiresFallback, F 
extends ValuesWriter
 
   private boolean fellBackAlready = false;
 
+  private boolean compressionChecked = false;
+
+  private final long checkAfterBytes;
+
+  /* Accumulates raw bytes across pages (only reset in resetDictionary) so the
+   * threshold check works even when individual pages are smaller than 
checkAfterBytes. */
+  private long cumulativeRawBytes = 0;

Review Comment:
   ```suggestion
     private boolean compressionChecked = false;
     private final long checkAfterBytes;
     /* Accumulates raw bytes across pages (only reset in resetDictionary) so 
the
      * threshold check works even when individual pages are smaller than 
checkAfterBytes. */
     private long cumulativeRawBytes = 0;
   ```
   
   How about removing these blank lines?



##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -30,7 +30,12 @@ public class FallbackValuesWriter<I extends ValuesWriter & 
RequiresFallback, F e
 
   public static <I extends ValuesWriter & RequiresFallback, F extends 
ValuesWriter> FallbackValuesWriter<I, F> of(
       I initialWriter, F fallBackWriter) {
-    return new FallbackValuesWriter<>(initialWriter, fallBackWriter);
+    return new FallbackValuesWriter<>(initialWriter, fallBackWriter, 0);

Review Comment:
   ```suggestion
       return new FallbackValuesWriter<>(initialWriter, fallBackWriter, 
/*checkAfterBytes=*/0);
   ```



##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -167,6 +180,7 @@ private void fallBack() {
   @Override
   public void writeByte(int value) {
     rawDataByteSize += 1;
+    cumulativeRawBytes += 1;

Review Comment:
   Should we use `cumulativeRawBytes + rawDataByteSize` for checking and only 
increase `cumulativeRawBytes` once a page to save some cycles?



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java:
##########
@@ -161,6 +161,7 @@ public static enum JobSummaryLevel {
   public static final String BLOCK_ROW_COUNT_LIMIT = 
"parquet.block.row.count.limit";
   public static final String PAGE_ROW_COUNT_LIMIT = 
"parquet.page.row.count.limit";
   public static final String PAGE_WRITE_CHECKSUM_ENABLED = 
"parquet.page.write-checksum.enabled";
+  public static final String DICTIONARY_CHECK_AFTER_BYTES = 
"parquet.dictionary.check.after.raw.bytes";

Review Comment:
   TBH, the flag name still looks a little bit unclear to me. How about 
renaming it to `parquet.dictionary.check.threshold.raw.size.bytes` and also 
change associated variable and function names?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to