zhangjun0x01 commented on pull request #2196: URL: https://github.com/apache/iceberg/pull/2196#issuecomment-772175903
@Stephen-Robin I think there is no need to split the large file, because if the file size exceeds the target size, it will be automatically split into multiple `CombinedScanTasks` when reading, and read concurrently, instead of having a task to read the large file. Splitting a large file into multiple small files will make the Rewrite Action consume more resources, and too many small files are not friendly to hdfs. what do you think about ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
