Done. Great to hear that Iceberg is now supported in Hyperspace! On Thu, Mar 4, 2021 at 9:22 AM Miao Wang <miw...@adobe.com> wrote:
> @rb...@netflix.com <rb...@netflix.com> can you add @Andrei Taleanu > <tale...@adobe.com> to the sync up? > > > > He mainly works on indexing in our team. > > > > His PR to Hyperspace has been merged, > https://github.com/microsoft/hyperspace/pull/358 > > > > Iceberg is now supported in Hyperspace for covering indexes. > > > > Thanks! > > > > Miao > > > > *From: *Ryan Blue <rb...@netflix.com.INVALID> > *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>, " > rb...@netflix.com" <rb...@netflix.com> > *Date: *Thursday, March 4, 2021 at 9:20 AM > *To: *Paula Ta-Shma <pa...@il.ibm.com> > *Cc: *Iceberg Dev List <dev@iceberg.apache.org>, OpenInx < > open...@gmail.com> > *Subject: *Re: Secondary Indexes - Pluggable File Filter interface for > Apache Iceberg > > > > Great, I'm glad everyone can make it. I've sent out an invite to the list > of people on the regular syncs. If you need to be added, please let me know. > > > > On Thu, Mar 4, 2021 at 3:01 AM Paula Ta-Shma <pa...@il.ibm.com> wrote: > > Hi all, > > This time works for Guy, Gal and myself, looking forward > > thanks! > Paula > > *Paula Ta-Shma, Ph.D.* > Cloud Storage and Analytics > IBM Research - Haifa > Phone:+972.74.7929402 > Email: pa...@il.ibm.com > > > > > From: Miao Wang <miw...@adobe.com.INVALID> > To: "dev@iceberg.apache.org" <dev@iceberg.apache.org>, " > rb...@netflix.com" <rb...@netflix.com>, OpenInx <open...@gmail.com> > Cc: Iceberg Dev List <dev@iceberg.apache.org> > Date: 04/03/2021 06:31 > Subject: [EXTERNAL] Re: Secondary Indexes - Pluggable File Filter > interface for Apache Iceberg > ------------------------------ > > > > > It works for me. With a quick thought, there may be a few concerns about > consolidated fashion storage. 1). Maintaining the consolidated storage may > be a bit more complex; 2). It may make collecting index while writing data > file (i.e., online > > It works for me. > > > > With a quick thought, there may be a few concerns about consolidated > fashion storage. > > > > 1). Maintaining the consolidated storage may be a bit more complex; > > 2). It may make collecting index while writing data file (i.e., online > index building) more complex (e.g., we need to consider that multiple > writers write to the same consolidated index file in parallel); > > 3). We need to have some auxiliary structure in the index file to quickly > locate relevant index given some key (e.g., a data file name); > > > > However, I do think consolidated fashion storage is some meaningful > optimization on the disk. If we properly design splitable and mergeable > index file format, the consolidation fashion and 1-data-file-1-index (1:1 > index file) are not mutual exclusive. Therefore, 1:1 index file can be the > building block for larger consolidated index files and index at different > levels, like partition level index. > > > > Our team member went through one pass of the design and shared some > thoughts with me. I will complete my pass. > > > > Thanks! > > > > Miao > > > > > > *From: *Ryan Blue <rb...@netflix.com.INVALID> > * Date: *Wednesday, March 3, 2021 at 6:08 PM > * To: *OpenInx <open...@gmail.com> > * Cc: *Iceberg Dev List <dev@iceberg.apache.org> > * Subject: *Re: Secondary Indexes - Pluggable File Filter interface for > Apache Iceberg > > Great, thank you for planning to join! I definitely want to get your input > on this as well. > > > > On Wed, Mar 3, 2021 at 6:06 PM OpenInx <open...@gmail.com> wrote: > > It will be 1:00 AM (China Standard Time) on 18 March, and it works for > our Asia people. I'd love to attend this discussion, Thanks. > > > > On Thu, Mar 4, 2021 at 9:50 AM Ryan Blue <rb...@netflix.com.invalid> > wrote: > > Thanks for putting this together, Guy! I just did a pass over the doc and > it looks like a really reasonable proposal for being able to inject custom > file filter implementations. > > > > One of the main things we need to think about is how to store and track > the index data. There's a comment in the doc about storing them in a > "consolidated fashion" and I'd like to hear more about what you're thinking > there. The index-per-file approach that Adobe is working on is a good way > to track index data because we get a clear lifecycle for index data because > it is tied to a data file that is immutable. On the other hand, the > drawback is that we have a lot of index files -- one per data file. > > > > Let's set up a time to go talk through the options. Would 9AM PST (17:00 > UTC) on 17 March work for everyone? I'm thinking in the morning so everyone > from IBM can attend. We can do a second discussion at a time that works > more for people in Asia later on as well. > > > > If that day works, then I'll send out an invite. > > > > On Fri, Feb 19, 2021 at 8:49 AM Guy Khazma <guyk...@gmail.com> wrote: > > Hi All, > > Following up on our discussion from Wednesday sync here attached is a > proposal to enhance iceberg with a pluggable interface for data skipping > indexes to enable use of existing indexes in job planning. > > > https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit?usp=sharing > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__nam04.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fdocs.google.com-252Fdocument-252Fd-252F11o3T7XQVITY-5F5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY-252Fedit-253Fusp-253Dsharing-26data-3D04-257C01-257Cmiwang-2540adobe.com-257C9ce4b2e7876c4e23a8ac08d8deb26ffc-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C637504205348408643-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C1000-26sdata-3DvFOaNdSwCYQO1p-252FDeX5glae-252BSo9aOF3S-252BR2bU2O1tM0-253D-26reserved-3D0%26d%3DDwMFAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DCCpi23S9sLfwLmJGiLj8eA%26m%3D77-PoT5uLV9A_GQstMtrliWMN9LVMmXTfCjE-YR8Jsk%26s%3DU4d2aQuDmG9yk4Y_IOQvLKweqbrAQWDGIpxaw8pvUeM%26e%3D&data=04%7C01%7Cmiwang%40adobe.com%7C979248e7bcc5469a8a3f08d8df31d45d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637504752403604886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ybYk30Phnt69Me7qvoC9PRU%2BzNSRtwpg33N6Hk3Gfrs%3D&reserved=0> > > We will be glad to get you feedback. > > Thanks, > Guy > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > -- Ryan Blue Software Engineer Netflix