wgtmac commented on code in PR #3556:
URL: https://github.com/apache/parquet-java/pull/3556#discussion_r3370723460
##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -79,8 +89,9 @@ public long getBufferedSize() {
@Override
public BytesInput getBytes() {
- if (!fellBackAlready && firstPage) {
- // we use the first page to decide if we're going to use this encoding
+ cumulativeRawBytes += rawDataByteSize;
Review Comment:
Do we need to check overflow just in case?
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java:
##########
@@ -771,6 +771,17 @@ public SELF withPageWriteChecksumEnabled(boolean
enablePageWriteChecksum) {
return self();
}
+ /**
+ * Set the raw data byte threshold after which the dictionary compression
check is performed.
+ *
+ * @param val byte threshold (0 means check on the first page / preserve
previous behavior)
Review Comment:
```suggestion
* @param val byte threshold (0 means checking on the first page for
every column chunk)
```
##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -30,7 +30,12 @@ public class FallbackValuesWriter<I extends ValuesWriter &
RequiresFallback, F e
public static <I extends ValuesWriter & RequiresFallback, F extends
ValuesWriter> FallbackValuesWriter<I, F> of(
I initialWriter, F fallBackWriter) {
- return new FallbackValuesWriter<>(initialWriter, fallBackWriter);
+ return new FallbackValuesWriter<>(initialWriter, fallBackWriter,
/*checkAfterBytes=*/ 0);
+ }
+
+ public static <I extends ValuesWriter & RequiresFallback, F extends
ValuesWriter> FallbackValuesWriter<I, F> of(
+ I initialWriter, F fallBackWriter, long checkAfterBytes) {
Review Comment:
ditto and same for all below
##########
parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java:
##########
@@ -322,6 +325,17 @@ public boolean getPageWriteChecksumEnabled() {
return pageWriteChecksumEnabled;
}
+ /**
+ * Returns the byte threshold after which the dictionary compression check
is performed.
+ * A value of 0 means check on the first page (backward compatible default).
Higher values
Review Comment:
IMO, we don't need to mention `backward compatible default` everywhere.
##########
parquet-column/src/main/java/org/apache/parquet/column/values/fallback/FallbackValuesWriter.java:
##########
@@ -30,7 +30,12 @@ public class FallbackValuesWriter<I extends ValuesWriter &
RequiresFallback, F e
public static <I extends ValuesWriter & RequiresFallback, F extends
ValuesWriter> FallbackValuesWriter<I, F> of(
I initialWriter, F fallBackWriter) {
- return new FallbackValuesWriter<>(initialWriter, fallBackWriter);
+ return new FallbackValuesWriter<>(initialWriter, fallBackWriter,
/*checkAfterBytes=*/ 0);
Review Comment:
`checkAfterBytes` now should be replaced by a more consistent name with
`dictionaryCheckThresholdRawSizeBytes`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]