hexiaoshu96 opened a new issue, #7052:
URL: https://github.com/apache/paimon/issues/7052

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   PaimonVersion: 0.9
   Storage: LocalFile
   
   ### Compute Engine
   
   FlinkVersion: 1.17.2
   
   ### Minimal reproduce step
   
   When using Paimon for partial column updates, we mistakenly wrote the 
primary key to the sequence group. This caused some interesting phenomena. If 
the version column of the sequence group is null and an update occurs, a 
ParquetDecodingException will result. Below is the process we reproduced in our 
local test.
   
   - Create Table DDL
   
   CREATE TABLE if not exists test.paimon_test_partical
   (
     order_id BIGINT,
     order_key STRING,
     send_order_name STRING,
     get_order_name STRING,
     send_order_version BIGINT,
     get_order_version BIGINT,
     PRIMARY KEY (order_id,order_key) NOT ENFORCED
   )
   WITH
   (
       'connector' = 'paimon',
       'bucket' = '1',
       'file.format' = 'parquet',
       'file.compression' = 'zstd',
       'merge-engine' = 'partial-update',
       'fields.send_order_version.sequence-group' = 'send_order_name,order_key',
       'fields.get_order_version.sequence-group' = 'get_order_name',
       'snapshot.num-retained.min' = '1',
       'snapshot.num-retained.max' = '1',
       'full-compaction.delta-commits' = '10'
   );
   - Batch Insert SQL
   1.  First, execute the first statement to write
   insert into 
test.paimon_test_partical(order_id,order_key,send_order_name,get_order_name,get_order_version)
 
   values 
   (1,'a','test_send','test_get',1766632140),
   (2,'a','test_send','test_get',1766632140),
   (3,'a','test_send','test_get',1766632140),
   (1,'b','test_send','test_get',1766632140),
   (2,'b','test_send','test_get',1766632140),
   (3,'b','test_send','test_get',1766632140),
   (1,'c','test_send','test_get',1766632140),
   (2,'c','test_send','test_get',1766632140),
   (3,'c','test_send','test_get',1766632140)
   
   QueryResult: 
   
   <img width="2196" height="378" alt="Image" 
src="https://github.com/user-attachments/assets/ae00701f-5ada-4808-8409-a20563d3ecf7";
 />
   
   2. Then write a portion of the data to overwrite.
   insert into 
test.paimon_test_partical(order_id,order_key,send_order_name,get_order_name,get_order_version)
 
   values 
   (1,'a','test_send1','test_get2',1766632143),
   (2,'a','test_send1','test_get2',1766632143),
   (3,'a','test_send1','test_get2',1766632143),
   (1,'b','test_send1','test_get2',1766632143),
   (2,'b','test_send1','test_get2',1766632143),
   (3,'b','test_send1','test_get2',1766632143),
   (1,'c','test_send1','test_get2',1766632143),
   (2,'c','test_send1','test_get2',1766632143),
   (3,'c','test_send1','test_get2',1766632143)
   
   QueryResult
   
   <img width="1732" height="648" alt="Image" 
src="https://github.com/user-attachments/assets/a6daa693-977a-4bcf-86f6-b4ca93016559";
 />
   
   - ExceptionInfo
   Caused by: 
org.apache.paimon.shade.org.apache.parquet.io.ParquetDecodingException: Failed 
to read 4 bytes
           at 
org.apache.paimon.format.parquet.reader.AbstractColumnReader.readDataBuffer(AbstractColumnReader.java:276)
 ~[paimon-flink-1.17-0.9.0.jar:0.9.0]
           at 
org.apache.paimon.format.parquet.reader.BytesColumnReader.readBinary(BytesColumnReader.java:86)
 ~[paimon-flink-1.17-0.9.0.jar:0.9.0]
           at 
org.apache.paimon.format.parquet.reader.BytesColumnReader.readBatch(BytesColumnReader.java:51)
 ~[paimon-flink-1.17-0.9.0.jar:0.9.0]
           at 
org.apache.paimon.format.parquet.reader.BytesColumnReader.readBatch(BytesColumnReader.java:32)
 ~[paimon-flink-1.17-0.9.0.jar:0.9.0]
           at 
org.apache.paimon.format.parquet.reader.AbstractColumnReader.readToVector(AbstractColumnReader.java:189)
 ~[paimon-flink-1.17-0.9.0.jar:0.9.0]
           at 
org.apache.paimon.format.parquet.ParquetReaderFactory$ParquetReader.nextBatch(ParquetReaderFactory.java:338)
 ~[paimon-flink-1.17-0.9.0.jar:0.9.0]
           at 
org.apache.paimon.format.parquet.ParquetReaderFactory$ParquetReader.readBatch(ParquetReaderFactory.java:309)
 ~[paimon-flink-1.17-0.9.0.jar:0.9.0]
           at 
org.apache.paimon.io.FileRecordReader.readBatch(FileRecordReader.java:47) 
~[paimon-flink-1.17-0.9.0.jar:0.9.0]
           at 
org.apache.paimon.flink.source.FileStoreSourceSplitReader.fetch(FileStoreSourceSplitReader.java:115)
 ~[paimon-flink-1.17-0.9.0.jar:0.9.0]
           at 
org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
 ~[flink-connector-files-1.17.2.jar:1.17.2]
           at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:162)
 ~[flink-connector-files-1.17.2.jar:1.17.2]
           at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:114)
 ~[flink-connector-files-1.17.2.jar:1.17.2]
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_471]
           at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_471]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_471]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_471]
           ... 1 more
   
   ### What doesn't meet your expectations?
   
   1. If the primary key column can be placed in a sequence group, should 
syntax validation be added?
   2. Why can the data be merged after removing the order key from the 
sequence-group, but the merge process sets the send_order_name value to empty, 
like this:
   
   <img width="1458" height="480" alt="Image" 
src="https://github.com/user-attachments/assets/2d6da69d-c511-47e7-8fdd-4a6dee1e5df6";
 />
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to