[GitHub] [iceberg] liubo1022126 opened a new issue #3916: Core: Orc data file not support to shouldRollToNewFile

GitBox Tue, 18 Jan 2022 00:04:24 -0800


liubo1022126 opened a new issue #3916:
URL: https://github.com/apache/iceberg/issues/3916



   @openinx @rdblue 
   
   I sort data within partitions by columns to gain performance, like `insert 
overwrite tableA partition(pt='20220118') select id,name,age from tableA where 
pt='20220118' order by id;`, and table's write.format.default=orc and 
'write.target-file-size-bytes'='134217728'.
   
   But the data file within partitions is only one file with a large size. and 
I find that [ORC file now not support target file size before 
closed](https://github.com/apache/iceberg/pull/1213#discussion_r459197243).
   because there is only a large data file in every partition, so I can't 
filter data files at planning time like 
https://iceberg.apache.org/#performance/#data-filtering.
   
   So if I want to use orc fileformat, how to RollToNewFile?
   
   By the way, In Flink steaming job, will roll a new file when checkpoint, 
what is the different with batch job? why batch job can't roll ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] liubo1022126 opened a new issue #3916: Core: Orc data file not support to shouldRollToNewFile

Reply via email to