[I] [Bug] [paimon]

via GitHub Wed, 20 Nov 2024 01:38:47 -0800


GangYang-HX opened a new issue, #4556:
URL: https://github.com/apache/paimon/issues/4556


   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   Paimon-0.8.1
   
   ### Compute Engine
   
   Flink-1.18.1
   
   ### Minimal reproduce step
   
   1. Start a Spark offline task containing a large number of tasks to read the 
Paimon table data
   2. During the offline task, add a new field
   Not necessarily displayed, there is a high probability！！！
   
   ### What doesn't meet your expectations?
   
   The alterTable operation is not atomic. When reading the Paimon table data, 
the Hive field and Paimon latest-schema information will be checked. There is a 
certain probability that they will not match and eventually cause query 
exceptions.
   
   `Hive DDL and paimon schema mismatched! It is recommended not to write any 
column definition as Paimon external table can read schema from the specified 
location.
   There are 1665 fields in Hive DDL: id, sticky_album_id ......
   There are 1666 fields in Paimon schema: id, sticky_album_id ......
        at 
org.apache.paimon.hive.HiveSchema.checkFieldsMatched(HiveSchema.java:249)
        at org.apache.paimon.hive.HiveSchema.extract(HiveSchema.java:165)
        at 
org.apache.paimon.hive.PaimonStorageHandler.getDataFieldsJsonStr(PaimonStorageHandler.java:89)
        at 
org.apache.paimon.hive.PaimonStorageHandler.configureInputJobProperties(PaimonStorageHandler.java:84)
        at 
org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:438)
        at 
org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:468)
        at 
org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:354)
        at 
org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:354)
        at 
org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:184)
        at 
org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:184)
        at scala.Option.foreach(Option.scala:407)
        at 
org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:184)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:181)`
   
   ### Anything else?
   
   <img width="941" alt="image" 
src="https://github.com/user-attachments/assets/3ce99c4b-552e-49c0-b87e-63b5dd43bec4";>
   org.apache.paimon.hive.HiveCatalog#alterTableImpl
   
   <img width="981" alt="image" 
src="https://github.com/user-attachments/assets/7e0a732f-63c1-4a25-b849-a66dc6616959";>
   org.apache.paimon.hive.HiveSchema#checkFieldsMatched
   
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] [paimon]

Reply via email to