Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

Ryan Blue Wed, 03 Mar 2021 17:50:21 -0800

Thanks for putting this together, Guy! I just did a pass over the doc and
it looks like a really reasonable proposal for being able to inject custom
file filter implementations.

One of the main things we need to think about is how to store and track the
index data. There's a comment in the doc about storing them in a
"consolidated fashion" and I'd like to hear more about what you're thinking
there. The index-per-file approach that Adobe is working on is a good way
to track index data because we get a clear lifecycle for index data because
it is tied to a data file that is immutable. On the other hand, the
drawback is that we have a lot of index files -- one per data file.

Let's set up a time to go talk through the options. Would 9AM PST (17:00
UTC) on 17 March work for everyone? I'm thinking in the morning so everyone
from IBM can attend. We can do a second discussion at a time that works
more for people in Asia later on as well.

If that day works, then I'll send out an invite.

On Fri, Feb 19, 2021 at 8:49 AM Guy Khazma <guyk...@gmail.com> wrote:

> Hi All,
>
> Following up on our discussion from Wednesday sync here attached is a
> proposal to enhance iceberg with a pluggable interface for data skipping
> indexes to enable use of existing indexes in job planning.
>
>
> https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit?usp=sharing
>
> We will be glad to get you feedback.
>
> Thanks,
> Guy
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

Reply via email to