Copilot commented on code in PR #3556:
URL: https://github.com/apache/parquet-java/pull/3556#discussion_r3378132635


##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -79,8 +91,11 @@ public long getBufferedSize() {
 
   @Override
   public BytesInput getBytes() {
-    if (!fellBackAlready && firstPage) {
-      // we use the first page to decide if we're going to use this encoding
+    if (cumulativeRawBytes + rawDataByteSize >= 0) {
+      cumulativeRawBytes += rawDataByteSize;
+    }

Review Comment:
   `cumulativeRawBytes + rawDataByteSize >= 0` is effectively always true for 
this writer (rawDataByteSize is only incremented). The conditional adds an 
unnecessary branch and contradicts the nearby comment stating overflow is not a 
concern. If overflow handling is desired, it should be done explicitly (e.g., 
saturating add); otherwise, just accumulate directly.



##########
parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java:
##########
@@ -322,6 +325,17 @@ public boolean getPageWriteChecksumEnabled() {
     return pageWriteChecksumEnabled;
   }
 
+  /**
+   * Returns the byte threshold after which the dictionary compression check 
is performed.
+   * A value of 0 means check on the first page. Higher values delay the check 
until that
+   * many raw bytes have been accumulated across pages.
+   *
+   * @return the byte threshold for the dictionary compression check
+   */
+  public long getDictionaryCheckThresholdRawSizeBytes() {
+    return dictionaryCheckThresholdRawSizeBytes;
+  }

Review Comment:
   PR description mentions adding a boolean `isDictionaryEarlyCheckEnabled()` 
to disable the early compression check, but this change introduces a 
byte-threshold (`getDictionaryCheckThresholdRawSizeBytes`) instead. To avoid 
confusion for reviewers and users, please update the PR description (or related 
docs/release notes) to match the threshold-based configuration that was 
actually implemented.



##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -57,16 +69,16 @@ public static <I extends ValuesWriter & RequiresFallback, F 
extends ValuesWriter
    */
   private long rawDataByteSize = 0;
 
-  /**
-   * indicates if this is the first page being processed
-   */
-  private boolean firstPage = true;
-
   public FallbackValuesWriter(I initialWriter, F fallBackWriter) {
+    this(initialWriter, fallBackWriter, 
/*dictionaryCheckThresholdRawSizeBytes=*/ 0);
+  }
+
+  public FallbackValuesWriter(I initialWriter, F fallBackWriter, long 
dictionaryCheckThresholdRawSizeBytes) {
     super();
     this.initialWriter = initialWriter;
     this.fallBackWriter = fallBackWriter;
     this.currentWriter = initialWriter;
+    this.dictionaryCheckThresholdRawSizeBytes = 
dictionaryCheckThresholdRawSizeBytes;
   }

Review Comment:
   The new `dictionaryCheckThresholdRawSizeBytes` parameter is not validated in 
this public constructor/factory path. `ParquetProperties.Builder` enforces `>= 
0`, but callers can still create `FallbackValuesWriter` directly with a 
negative value, which would make the compression check run immediately and hide 
configuration errors. Consider rejecting negative values here as well for 
robustness.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to