Copilot commented on code in PR #3556:
URL: https://github.com/apache/parquet-java/pull/3556#discussion_r3378132635
##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -79,8 +91,11 @@ public long getBufferedSize() {
@Override
public BytesInput getBytes() {
- if (!fellBackAlready && firstPage) {
- // we use the first page to decide if we're going to use this encoding
+ if (cumulativeRawBytes + rawDataByteSize >= 0) {
+ cumulativeRawBytes += rawDataByteSize;
+ }
Review Comment:
`cumulativeRawBytes + rawDataByteSize >= 0` is effectively always true for
this writer (rawDataByteSize is only incremented). The conditional adds an
unnecessary branch and contradicts the nearby comment stating overflow is not a
concern. If overflow handling is desired, it should be done explicitly (e.g.,
saturating add); otherwise, just accumulate directly.
##########
parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java:
##########
@@ -322,6 +325,17 @@ public boolean getPageWriteChecksumEnabled() {
return pageWriteChecksumEnabled;
}
+ /**
+ * Returns the byte threshold after which the dictionary compression check
is performed.
+ * A value of 0 means check on the first page. Higher values delay the check
until that
+ * many raw bytes have been accumulated across pages.
+ *
+ * @return the byte threshold for the dictionary compression check
+ */
+ public long getDictionaryCheckThresholdRawSizeBytes() {
+ return dictionaryCheckThresholdRawSizeBytes;
+ }
Review Comment:
PR description mentions adding a boolean `isDictionaryEarlyCheckEnabled()`
to disable the early compression check, but this change introduces a
byte-threshold (`getDictionaryCheckThresholdRawSizeBytes`) instead. To avoid
confusion for reviewers and users, please update the PR description (or related
docs/release notes) to match the threshold-based configuration that was
actually implemented.
##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -57,16 +69,16 @@ public static <I extends ValuesWriter & RequiresFallback, F
extends ValuesWriter
*/
private long rawDataByteSize = 0;
- /**
- * indicates if this is the first page being processed
- */
- private boolean firstPage = true;
-
public FallbackValuesWriter(I initialWriter, F fallBackWriter) {
+ this(initialWriter, fallBackWriter,
/*dictionaryCheckThresholdRawSizeBytes=*/ 0);
+ }
+
+ public FallbackValuesWriter(I initialWriter, F fallBackWriter, long
dictionaryCheckThresholdRawSizeBytes) {
super();
this.initialWriter = initialWriter;
this.fallBackWriter = fallBackWriter;
this.currentWriter = initialWriter;
+ this.dictionaryCheckThresholdRawSizeBytes =
dictionaryCheckThresholdRawSizeBytes;
}
Review Comment:
The new `dictionaryCheckThresholdRawSizeBytes` parameter is not validated in
this public constructor/factory path. `ParquetProperties.Builder` enforces `>=
0`, but callers can still create `FallbackValuesWriter` directly with a
negative value, which would make the compression check run immediately and hide
configuration errors. Consider rejecting negative values here as well for
robustness.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]