wanlce opened a new issue, #8510:
URL: https://github.com/apache/iceberg/issues/8510
### Query engine
- Flink 1.13
- Spark 3.2
- Iceberg 1.2
### Question
The default file size for the merged target in iceberg is 512M. Currently,
data is written to iceberg through Flink CDC. Due to the checkpoint time being
5 minutes, there will be a large number of small files generated. Therefore, we
adopt a scheduling approach to regularly execute rewrite_data_files and execute
expire_snapshots, the date is 3 days ago.
SQL Query:
```
select count (1) as cnt from catalog. schema. tab. files where file_
Size<1M;
```
result:
cnt=1.7w
then:
I manually execute, rewrite_Data_ Merge small files, return: No small files
need to be merged
Question 1:
Why I manually trigger a small file merge operation without actually
executing it ?
Question 2:
I don't see the corresponding parameters about rewrite_data_files on the
official website, Is there a threshold for controlling whether small file
merging is actually execute ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]