[I] Clustering failed are caused by inconsistencies in the NOT NULL attribute of the schema column when the same table written by Spark and Flink. [hudi]

via GitHub Mon, 16 Mar 2026 23:07:18 -0700


cbg-wx opened a new issue, #18329:
URL: https://github.com/apache/hudi/issues/18329


   ### Bug Description
   
   **What happened:**
   when I use spark create a hudi table, the table some column（eg recordkey 
col） with not null property, then, I use spark sink data to this table for 
history initial, latestly, I use flink streaming sink realtime data to this 
table with insert operation and inline clustering，then，I encountered an 
exception when clustering.
   **What you expected:**
   
   after create table，Spark and Flink maintain a consistent schema when writing 
to the same table.
   
   **Steps to reproduce:**
   1.use sparksql create a hudi table with hudi_test_id not null
   
   ```sql
   CREATE TABLE hudi_table (
       ts BIGINT,
       hudi_test_id BIGINT not null,
       rider STRING,
       driver STRING,
       fare DOUBLE,
       city STRING
   ) USING HUDI
   PARTITIONED BY (city);
   ```
   
   2.use sparksql insert into data to above table
   3.use flinksql insert into data to above table with insert write operation 
and inline async clustering
   
   ### Environment
   
   Hudi version: hudi-0.1.41
   Sink engine: spark-3.4.2,flink-1.16.2
   Relevant configs: hive sync strategy RT
   
   ### Logs and Stack Trace
   
   ```java
   2025-12-30 14:24:29,788 INFO org.apache.parquet.hadoop.InternalParquetReader 
    [] - block read in memory in 1 ms .row count = 107
   2025-12-30 14:24:29,788 ERROR 
org.apache.hudi.sink.clustering.ClusteringOperator [] - Executor action 
[Execute clustering for instant 20251230140224328 from task 0] error
   org.apache.hudi.exception.HoodieEception: unable to read next record from 
parquet file
       at 
org.apache.hudi.common.util.ParquetIterator.hasNext(ParquetIterator.java:54)~[hudi-flink1.16-bundle-0.14.1.jar]
       ...
       at 
org.apache.hudi.client.utils.ConcateingOperator.hasNext(ConcateingOperator.java:251)~[hudi-flink1.16-bundle-0.14.1.jar]~[hudi-flink1.16-bundle-0.14.1.jar]
       at 
org.apache.hudi.sink.clustering.ClusteringOperator.doClustering(ClusteringOperator.java:251)~[hudi-flink1.16-bundle-0.14.1.jar]~[hudi-flink1.16-bundle-0.14.1.jar]
   Caused by: org.apache.parquet.io.ParquetDecodingException:Can not read value 
at 0 in block -1 in file 
hdfs://nsfed//hudi/rtcp_001/qrt_app_rtcp_001_test/hudi_table/dt=20251230/00000003-3f63-4106-9b82-952240e7f8cc-0_3-10-13_20251230111845024.parquet
        at 
org.apache.parquet.hadoop.InternalParquetReader.nextKeyValue(InternalParquetReader.java:254)~[hudi-flink1.16-bundle-0.14.1.jar]~[hudi-flink1.16-bundle-0.14.1.jar]
           ...
   Caused by: org.apache.parquet.io.ParquetDecodingException: The requested 
schema is not compatible with the file schema. incompatible types: required 
int64 hudi_test_id != optional int64 hudi_test_id
       at 
org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVistor.incompatibleSchema(ColumnIOFactory.java:101)~[hudi-flink1.16-bundle-0.14.1.jar]~[hudi-flink1.16-bundle-0.14.1.jar]
       ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Clustering failed are caused by inconsistencies in the NOT NULL attribute of the schema column when the same table written by Spark and Flink. [hudi]

Reply via email to