rdblue opened a new issue #36: Split files when planning scan tasks
URL: https://github.com/apache/incubator-iceberg/issues/36
 
 
   When building a scan, the TableScan API can plan the files to read 
(`planFiles`) or group the files into combined splits (`planTasks`). Split 
planning should also split files at the target split size before bin packing to 
create the final splits.
   
   This relates to adding split locations to the manifest file (row group or 
stripe offsets). The simple version of this issue is to split at the target 
split size and then combine, but eventually we want to take the split offsets 
into account if it does make sense to store them in the manifest file.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to