cbg-wx opened a new issue, #18329:
URL: https://github.com/apache/hudi/issues/18329
### Bug Description
**What happened:**
when I use spark create a hudi table, the table some column(eg recordkey
col) with not null property, then, I use spark sink data to this table for
history initial, latestly, I use flink streaming sink realtime data to this
table with insert operation and inline clustering,then,I encountered an
exception when clustering.
**What you expected:**
after create table,Spark and Flink maintain a consistent schema when writing
to the same table.
**Steps to reproduce:**
1.use sparksql create a hudi table with hudi_test_id not null
```sql
CREATE TABLE hudi_table (
ts BIGINT,
hudi_test_id BIGINT not null,
rider STRING,
driver STRING,
fare DOUBLE,
city STRING
) USING HUDI
PARTITIONED BY (city);
```
2.use sparksql insert into data to above table
3.use flinksql insert into data to above table with insert write operation
and inline async clustering
### Environment
Hudi version: hudi-0.1.41
Sink engine: spark-3.4.2,flink-1.16.2
Relevant configs: hive sync strategy RT
### Logs and Stack Trace
```java
2025-12-30 14:24:29,788 INFO org.apache.parquet.hadoop.InternalParquetReader
[] - block read in memory in 1 ms .row count = 107
2025-12-30 14:24:29,788 ERROR
org.apache.hudi.sink.clustering.ClusteringOperator [] - Executor action
[Execute clustering for instant 20251230140224328 from task 0] error
org.apache.hudi.exception.HoodieEception: unable to read next record from
parquet file
at
org.apache.hudi.common.util.ParquetIterator.hasNext(ParquetIterator.java:54)~[hudi-flink1.16-bundle-0.14.1.jar]
...
at
org.apache.hudi.client.utils.ConcateingOperator.hasNext(ConcateingOperator.java:251)~[hudi-flink1.16-bundle-0.14.1.jar]~[hudi-flink1.16-bundle-0.14.1.jar]
at
org.apache.hudi.sink.clustering.ClusteringOperator.doClustering(ClusteringOperator.java:251)~[hudi-flink1.16-bundle-0.14.1.jar]~[hudi-flink1.16-bundle-0.14.1.jar]
Caused by: org.apache.parquet.io.ParquetDecodingException:Can not read value
at 0 in block -1 in file
hdfs://nsfed//hudi/rtcp_001/qrt_app_rtcp_001_test/hudi_table/dt=20251230/00000003-3f63-4106-9b82-952240e7f8cc-0_3-10-13_20251230111845024.parquet
at
org.apache.parquet.hadoop.InternalParquetReader.nextKeyValue(InternalParquetReader.java:254)~[hudi-flink1.16-bundle-0.14.1.jar]~[hudi-flink1.16-bundle-0.14.1.jar]
...
Caused by: org.apache.parquet.io.ParquetDecodingException: The requested
schema is not compatible with the file schema. incompatible types: required
int64 hudi_test_id != optional int64 hudi_test_id
at
org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVistor.incompatibleSchema(ColumnIOFactory.java:101)~[hudi-flink1.16-bundle-0.14.1.jar]~[hudi-flink1.16-bundle-0.14.1.jar]
...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]