amogh-jahagirdar commented on code in PR #16501:
URL: https://github.com/apache/iceberg/pull/16501#discussion_r3282538671
##########
arrow/src/test/java/org/apache/iceberg/arrow/vectorized/TestArrowReader.java:
##########
@@ -388,6 +388,99 @@ public void testTimestampMillisAreReadCorrectly() throws
Exception {
assertThat(totalRowsRead).as("Should read all
rows").isEqualTo(millisValues.size());
}
+ /**
+ * Regression test: a decimal column whose Iceberg field carries an
initialDefault/writeDefault
+ * must be readable by the vectorized reader. Before the fix to {@code
+ * VectorizedArrowReader#getPhysicalType}, allocating the vector copied the
decimal-typed default
+ * onto the underlying physical type (int/long/fixed) and failed with {@code
+ * IllegalArgumentException: Cannot cast default value to ...}.
+ */
+ @Test
+ public void testDecimalWithDefaultIsReadByVectorizedReader() throws
Exception {
Review Comment:
So the short answer is that when the parquet schema is visited by the
vectorized read builder and identifies there's a field not in the parquet file
but in the table schema with a default , a Constant vector reader is created
with the default value.
##########
arrow/src/test/java/org/apache/iceberg/arrow/vectorized/TestArrowReader.java:
##########
@@ -388,6 +388,99 @@ public void testTimestampMillisAreReadCorrectly() throws
Exception {
assertThat(totalRowsRead).as("Should read all
rows").isEqualTo(millisValues.size());
}
+ /**
+ * Regression test: a decimal column whose Iceberg field carries an
initialDefault/writeDefault
Review Comment:
Is there a way to update the existing TestParquetVectorizedReads. That
already implements a mixin `supportsDefaultValues()` and already runs through
decimal type. I think the issue as you pointed out is specifically when it's
not dictionary encoded (which for decimal I'd actually expect to generally be
the case). Maybe in that class when we produce the writer there's a way to pass
through options that disable dictionary encoding?
##########
arrow/src/test/java/org/apache/iceberg/arrow/vectorized/TestArrowReader.java:
##########
@@ -388,6 +388,99 @@ public void testTimestampMillisAreReadCorrectly() throws
Exception {
assertThat(totalRowsRead).as("Should read all
rows").isEqualTo(millisValues.size());
}
+ /**
+ * Regression test: a decimal column whose Iceberg field carries an
initialDefault/writeDefault
Review Comment:
I'd also look at goldenFilesAndEncodings in that existing test class
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]