JeremyXin opened a new issue, #10377: URL: https://github.com/apache/seatunnel/issues/10377
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened When synchronizing hive tables, the method based on Hdfs parquet file synchronization is adopted for execution. When performing the full data synchronization, it was discovered that for several new fields added to the source table, no data was written in the target table. After analyzing the source code, it was found that when the framework obtains the list of files in the directory, it defaults to only parsing the first file in the list to extract the schema. However, this file might be an old one, resulting in the schema obtained from parsing the Parquet file header being outdated and not including the newly added fields. This causes the target table to be unable to synchronize these newly added fields continuously. ### SeaTunnel Version 2.3.12 ### SeaTunnel Config ```conf env { job.mode = "BATCH" parallelism = 10 } source { HdfsFile { path = "hdfs://xxx/ods_log_di/" file_format_type = "parquet" fs.defaultFS = "hdfs://cluster" hdfs_site_path = "/tmp/hdfs-site.xml" krb5_path = "/tmp/krb5.conf" kerberos_principal = "xxx" kerberos_keytab_path = "/tmp/qiye_mail_data.keytab" } } sink { Doris { fenodes = "fenodes:8030" username = "xxx" password = "xxx" database = "test" table = "xxx" doris.config { format = "json" read_json_by_line = "true" } } } ``` ### Running Command ```shell sh bin/seatunnel.sh --config/v2.batch.parquet.config -m local ``` ### Error Exception ```log The data written into the destination table does not contain any new fields added to the source table. ``` ### Zeta or Flink or Spark Version Zeta ### Java or Scala Version java 1.8 ### Screenshots  ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
