Jiezhi opened a new issue #4238:
URL: https://github.com/apache/hudi/issues/4238


   **Describe the problem you faced**
   
   Because of this issue https://github.com/apache/hudi/issues/4190, there're 
some partitions contain lots of small files(about 1k files).
   
   And I want to merge those files, I've tried Flink SQL `insert overwrite` to 
merge files but failed:
   
   ```SQL
   insert overwrite tableA partition(log_date='2021-12-02') select * from 
tableA where log_date='2021-12-02';
   
   [INFO] Submitting SQL update statement to the cluster...
   [ERROR] Could not execute SQL statement. Reason:
   org.apache.flink.table.api.ValidationException: Column types of query result 
and sink for registered table 'hive_catalog.xxx.tableA' do not match.
   Cause: Different number of columns.
   
   Query schema: [distinct_id: STRING, ..., EXPR$26: STRING NOT NULL, log_date: 
STRING]
   Sink schema:  [distinct_id: STRING, ..., log_date: STRING]
   ```
   
   That might be Flink SQL issue, but I wonder is there any solution to merge 
hudi small files manually, hudi-cli or some code to trigger merging?
   
   Like Hive `CONCATENATE`.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   No
   
   **Expected behavior**
   
   Some hudi-cli features to manually manage exist tables, small files get 
merged.
   
   **Environment Description**
   
   * Hudi version : 0.10.0
   
   * Spark version : NA
   
   * Hive version : 2.1.1
   
   * Hadoop version : 3.0
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   * Flink: 1.13.2
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to