harperjiang opened a new issue, #16502:
URL: https://github.com/apache/iceberg/issues/16502

   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   
   ## Issue Summary
   
   When the vectorized Arrow reader is used to read a v3 Iceberg table that has 
a `decimal` column carrying an `initialDefault` or `writeDefault`, vector 
allocation fails with:
   
   ```
   java.lang.IllegalArgumentException: Cannot cast default value to FIXED[9]: 
12345.6789
     at org.apache.iceberg.types.Types$NestedField.castDefault(Types.java:892)
     at org.apache.iceberg.types.Types$NestedField.<init>(Types.java:881)
     at org.apache.iceberg.types.Types$NestedField$Builder.build(Types.java:850)
     at 
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.getPhysicalType(VectorizedArrowReader.java:255)
     at 
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.allocateFieldVector(VectorizedArrowReader.java:228)
     at 
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.read(VectorizedArrowReader.java:151)
   ```
   
   The message varies with the underlying Parquet physical encoding:
   - `FIXED_LEN_BYTE_ARRAY`-backed decimal → `Cannot cast default value to 
fixed[N]: <default>`
   
   Same read path with vectorization disabled has no errors:
   
   ```
   spark.sql.iceberg.vectorization.enabled=false
   ```
   
   ## Repro
   
   1. Create a v3 Iceberg table with a decimal column that has a default value:
   
   ```sql
   CREATE TABLE local.db.t (
     id INT,
     amount DECIMAL(5, 2) DEFAULT 0.00
   ) USING iceberg TBLPROPERTIES ('format-version' = '3');
   
   INSERT INTO local.db.t VALUES (1, 1.23), (2, 4.56), (3, 7.89);
   ```
   
   2. Read with vectorization enabled (the default):
   
   ```sql
   SET spark.sql.iceberg.vectorization.enabled=true;
   SELECT * FROM local.db.t;
   ```
   
   The query fails with the stack trace above. The failure is deterministic 
only when the column is not dictionary-encoded; with dictionary encoding, 
allocation goes through `allocateDictEncodedVector` and bypasses the buggy 
path, so small/highly-repetitive data sets may appear to read successfully.
   
   ## Root cause
   
   `VectorizedArrowReader#getPhysicalType` rewrites a decimal Iceberg field to 
its underlying physical type (`int` / `long` / `fixed[N]`) so the right Arrow 
vector class can be allocated:
   
   ```java
   physicalType = Types.NestedField.from(logicalType).ofType(type).build();
   ```
   
   `Types.NestedField.Builder.from(field)` copies the field's `initialDefault` 
and `writeDefault` onto the builder. `NestedField`'s constructor then calls 
`castDefault(literal, type)` against the new physical type — for a decimal 
default this delegates to `DecimalLiteral.to(LongType | IntegerType | 
FixedType)`, which is undefined and returns `null`, tripping the 
`Preconditions.checkArgument` in `castDefault`.
   
   Conceptually, the defaults belong to the logical (decimal) view of the 
column and should not flow to the physical representation — the physical type 
is an internal detail used only to size the Arrow vector. The non-vectorized 
readers (`BaseParquetReaders`, `SparkParquetReaders`, `FlinkParquetReaders`) 
all apply defaults at the logical-type layer and are unaffected.
   
   Proposed PR for the fix: https://github.com/apache/iceberg/pull/16501
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to