Jiezhi opened a new issue #4238: URL: https://github.com/apache/hudi/issues/4238
**Describe the problem you faced** Because of this issue https://github.com/apache/hudi/issues/4190, there're some partitions contain lots of small files(about 1k files). And I want to merge those files, I've tried Flink SQL `insert overwrite` to merge files but failed: ```SQL insert overwrite tableA partition(log_date='2021-12-02') select * from tableA where log_date='2021-12-02'; [INFO] Submitting SQL update statement to the cluster... [ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.api.ValidationException: Column types of query result and sink for registered table 'hive_catalog.xxx.tableA' do not match. Cause: Different number of columns. Query schema: [distinct_id: STRING, ..., EXPR$26: STRING NOT NULL, log_date: STRING] Sink schema: [distinct_id: STRING, ..., log_date: STRING] ``` That might be Flink SQL issue, but I wonder is there any solution to merge hudi small files manually, hudi-cli or some code to trigger merging? Like Hive `CONCATENATE`. **To Reproduce** Steps to reproduce the behavior: No **Expected behavior** Some hudi-cli features to manually manage exist tables, small files get merged. **Environment Description** * Hudi version : 0.10.0 * Spark version : NA * Hive version : 2.1.1 * Hadoop version : 3.0 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no * Flink: 1.13.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
