[GitHub] [iceberg] rdblue commented on issue #1422: Distributing Job Planning on Spark

GitBox Tue, 29 Sep 2020 12:02:00 -0700


rdblue commented on issue #1422:
URL: https://github.com/apache/iceberg/issues/1422#issuecomment-700918564



   I think that I agree with the metadata table approach. Because Presto can 
run tasks and planning at the same time, this is less of an issue. And the work 
done for Spark in option 2 could translate into a parallel scan on a Presto 
metadata table as well (converting partition predicates to filters on metadata 
table columns). Flink is much more likely to consume tables incrementally, so I 
think it wouldn't be a big issue there for now (but would be nice to hear from 
them).
   
   Risk is lower with option 2, and I think it sounds like the better option. 
It also pushes on the metadata tables in healthy ways: it would incentivize 
building pushdown in the files and entries metadata tables and might require 
adding a `delete_files` metadata table. Those are good side-effects of 
implementing this that way.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on issue #1422: Distributing Job Planning on Spark

Reply via email to