logicbaby opened a new issue, #5081:
URL: https://github.com/apache/paimon/issues/5081

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   paimon-flink-1.20-1.0.1.jar
   paimon-s3-1.0.1.jar
   paimon-flink-action-1.0.1.jar
   
   ### Compute Engine
   
   flink-1.20.0
   
   ### Minimal reproduce step
   
   Use mysql cdc sync table to paimon table which on s3. it cannot complet 
checkpoint, taskmanager report:
   
   ```
   Caused by: java.lang.RuntimeException: 
s3://paas-flink-prod/flink-paimon/wh/chen.db/department/bucket-0/data-65dbb220-7017-468d-affb-1de9dd6e4105-0.parquet
 is not a Parquet file. Expected magic number at tail, but found [21, 0, 21, 
-32]
        at 
org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:162)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:243)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.format.parquet.ParquetUtil.getParquetReader(ParquetUtil.java:85)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.format.parquet.ParquetUtil.extractColumnStats(ParquetUtil.java:52)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.format.parquet.ParquetSimpleStatsExtractor.extractWithFileInfo(ParquetSimpleStatsExtractor.java:78)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.format.parquet.ParquetSimpleStatsExtractor.extract(ParquetSimpleStatsExtractor.java:71)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.io.StatsCollectingSingleFileWriter.fieldStats(StatsCollectingSingleFileWriter.java:105)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.io.KeyValueDataFileWriter.result(KeyValueDataFileWriter.java:169)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.io.KeyValueDataFileWriter.result(KeyValueDataFileWriter.java:58)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:135)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.io.RollingFileWriter.close(RollingFileWriter.java:167) 
~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.mergetree.MergeTreeWriter.flushWriteBuffer(MergeTreeWriter.java:235)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
        at 
org.apache.paimon.mergetree.MergeTreeWriter.prepareCommit(MergeTreeWriter.java:264)
 ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
   ```
   
   I have downloaded this parquet and checked it is ok.
   
   cdc params:
   
     local:///opt/flink/usrlib/paimon-flink-action-1.0.1.jar \
       mysql_sync_table \
       --warehouse s3://paas-flink-prod/flink-paimon/wh \
       --database chen \
       --table department \
       --mysql_conf hostname=rm-xxx.mysql.rds.aliyuncs.com \
       --mysql_conf username=** \
       --mysql_conf password='**' \
       --mysql_conf database-name='xxx' \
       --mysql_conf table-name='department'
   
   
   ### What doesn't meet your expectations?
   
   it's cannot use s3 as paimon warehouse backend storage, hdfs is ok.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to