Thanks for putting this together, Guy! I just did a pass over the doc and it looks like a really reasonable proposal for being able to inject custom file filter implementations.
One of the main things we need to think about is how to store and track the index data. There's a comment in the doc about storing them in a "consolidated fashion" and I'd like to hear more about what you're thinking there. The index-per-file approach that Adobe is working on is a good way to track index data because we get a clear lifecycle for index data because it is tied to a data file that is immutable. On the other hand, the drawback is that we have a lot of index files -- one per data file. Let's set up a time to go talk through the options. Would 9AM PST (17:00 UTC) on 17 March work for everyone? I'm thinking in the morning so everyone from IBM can attend. We can do a second discussion at a time that works more for people in Asia later on as well. If that day works, then I'll send out an invite. On Fri, Feb 19, 2021 at 8:49 AM Guy Khazma <guyk...@gmail.com> wrote: > Hi All, > > Following up on our discussion from Wednesday sync here attached is a > proposal to enhance iceberg with a pluggable interface for data skipping > indexes to enable use of existing indexes in job planning. > > > https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit?usp=sharing > > We will be glad to get you feedback. > > Thanks, > Guy > -- Ryan Blue Software Engineer Netflix