@rb...@netflix.com<mailto:rb...@netflix.com> can you add @Andrei 
Taleanu<mailto:tale...@adobe.com> to the sync up?

He mainly works on indexing in our team.

His PR to Hyperspace has been merged, 
https://github.com/microsoft/hyperspace/pull/358

Iceberg is now supported in Hyperspace for covering indexes.

Thanks!

Miao

From: Ryan Blue <rb...@netflix.com.INVALID>
Reply-To: "dev@iceberg.apache.org" <dev@iceberg.apache.org>, 
"rb...@netflix.com" <rb...@netflix.com>
Date: Thursday, March 4, 2021 at 9:20 AM
To: Paula Ta-Shma <pa...@il.ibm.com>
Cc: Iceberg Dev List <dev@iceberg.apache.org>, OpenInx <open...@gmail.com>
Subject: Re: Secondary Indexes - Pluggable File Filter interface for Apache 
Iceberg

Great, I'm glad everyone can make it. I've sent out an invite to the list of 
people on the regular syncs. If you need to be added, please let me know.

On Thu, Mar 4, 2021 at 3:01 AM Paula Ta-Shma 
<pa...@il.ibm.com<mailto:pa...@il.ibm.com>> wrote:
Hi all,

This time works for Guy, Gal and myself, looking forward

thanks!
Paula

Paula Ta-Shma, Ph.D.
Cloud Storage and Analytics
IBM Research - Haifa
Phone:+972.74.7929402
Email: pa...@il.ibm.com<mailto:pa...@il.ibm.com>




From:        Miao Wang <miw...@adobe.com.INVALID>
To:        "dev@iceberg.apache.org<mailto:dev@iceberg.apache.org>" 
<dev@iceberg.apache.org<mailto:dev@iceberg.apache.org>>, 
"rb...@netflix.com<mailto:rb...@netflix.com>" 
<rb...@netflix.com<mailto:rb...@netflix.com>>, OpenInx 
<open...@gmail.com<mailto:open...@gmail.com>>
Cc:        Iceberg Dev List 
<dev@iceberg.apache.org<mailto:dev@iceberg.apache.org>>
Date:        04/03/2021 06:31
Subject:        [EXTERNAL] Re: Secondary Indexes - Pluggable File Filter 
interface for Apache Iceberg
________________________________



It works for me. With a quick thought, there may be a few concerns about 
consolidated fashion storage. 1). Maintaining the consolidated storage may be a 
bit more complex; 2). It may make collecting index while writing data file 
(i.e., online

It works for me.



With a quick thought, there may be a few concerns about consolidated fashion 
storage.



1). Maintaining the consolidated storage may be a bit more complex;

2). It may make collecting index while writing data file (i.e., online index 
building) more complex (e.g., we need to consider that multiple writers write 
to the same consolidated index file in parallel);

3). We need to have some auxiliary structure in the index file to quickly 
locate relevant index given some key (e.g., a data file name);



However, I do think consolidated fashion storage is some meaningful 
optimization on the disk. If we properly design splitable and mergeable index 
file format, the consolidation fashion and 1-data-file-1-index (1:1 index file) 
are not mutual exclusive. Therefore, 1:1 index file can be the building block 
for larger consolidated index files and index at different levels, like 
partition level index.



Our team member went through one pass of the design and shared some thoughts 
with me. I will complete my pass.



Thanks!



Miao





From: Ryan Blue <rb...@netflix.com.INVALID>
Date: Wednesday, March 3, 2021 at 6:08 PM
To: OpenInx <open...@gmail.com<mailto:open...@gmail.com>>
Cc: Iceberg Dev List <dev@iceberg.apache.org<mailto:dev@iceberg.apache.org>>
Subject: Re: Secondary Indexes - Pluggable File Filter interface for Apache 
Iceberg

Great, thank you for planning to join! I definitely want to get your input on 
this as well.



On Wed, Mar 3, 2021 at 6:06 PM OpenInx 
<open...@gmail.com<mailto:open...@gmail.com>> wrote:

It will be  1:00 AM (China Standard Time) on 18 March,  and it works for our 
Asia people.   I'd love to attend this discussion, Thanks.



On Thu, Mar 4, 2021 at 9:50 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

Thanks for putting this together, Guy! I just did a pass over the doc and it 
looks like a really reasonable proposal for being able to inject custom file 
filter implementations.



One of the main things we need to think about is how to store and track the 
index data. There's a comment in the doc about storing them in a "consolidated 
fashion" and I'd like to hear more about what you're thinking there. The 
index-per-file approach that Adobe is working on is a good way to track index 
data because we get a clear lifecycle for index data because it is tied to a 
data file that is immutable. On the other hand, the drawback is that we have a 
lot of index files -- one per data file.



Let's set up a time to go talk through the options. Would 9AM PST (17:00 UTC) 
on 17 March work for everyone? I'm thinking in the morning so everyone from IBM 
can attend. We can do a second discussion at a time that works more for people 
in Asia later on as well.



If that day works, then I'll send out an invite.



On Fri, Feb 19, 2021 at 8:49 AM Guy Khazma 
<guyk...@gmail.com<mailto:guyk...@gmail.com>> wrote:

Hi All,

Following up on our discussion from Wednesday sync here attached is a proposal 
to enhance iceberg with a pluggable interface for data skipping indexes to 
enable use of existing indexes in job planning.

https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit?usp=sharing<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__nam04.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fdocs.google.com-252Fdocument-252Fd-252F11o3T7XQVITY-5F5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY-252Fedit-253Fusp-253Dsharing-26data-3D04-257C01-257Cmiwang-2540adobe.com-257C9ce4b2e7876c4e23a8ac08d8deb26ffc-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C637504205348408643-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C1000-26sdata-3DvFOaNdSwCYQO1p-252FDeX5glae-252BSo9aOF3S-252BR2bU2O1tM0-253D-26reserved-3D0%26d%3DDwMFAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DCCpi23S9sLfwLmJGiLj8eA%26m%3D77-PoT5uLV9A_GQstMtrliWMN9LVMmXTfCjE-YR8Jsk%26s%3DU4d2aQuDmG9yk4Y_IOQvLKweqbrAQWDGIpxaw8pvUeM%26e%3D&data=04%7C01%7Cmiwang%40adobe.com%7C979248e7bcc5469a8a3f08d8df31d45d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637504752403604886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ybYk30Phnt69Me7qvoC9PRU%2BzNSRtwpg33N6Hk3Gfrs%3D&reserved=0>

We will be glad to get you feedback.

Thanks,
Guy



--

Ryan Blue

Software Engineer

Netflix



--

Ryan Blue

Software Engineer

Netflix



--
Ryan Blue
Software Engineer
Netflix

Reply via email to