Thanks for the feedback, Jacques!

You are correct, we kept the question of the best approach as open :) The idea 
was to have a discussion in the community. Hopefully, we can reach a consensus.

While the proposed “lazy” approaches certainly offer significant benefits, they 
require more changes in Iceberg as well as in readers/query engines (depending 
on how we want to merge base and diff files). For us, it is important to 
understand whether the Iceberg community would even consider such changes. 

Hive ACID 3 is one the projects we looked at. In fact, we spoke to Owen, the 
original creator of updates/deletes/upserts in Hive. I believe the “lazy” 
approaches are close to what Hive 3 does but with their own distinctions that 
Iceberg allows us to have. It would be great to have Owen’s feedback.

We don’t know the internals of Delta as updates/deletes/upserts are not open 
source. My personal guess, yes, it might be similar to the “eager” approach in 
our doc.

Jacques, could you share some insights how you implement the merge of diffs? Is 
it done by readers?

Thanks,
Anton

> On 10 May 2019, at 06:24, Jacques Nadeau <jacq...@dremio.com> wrote:
> 
> This is a nice doc and it covers many different options. Upon first skim, I 
> don't see a strong argument for particular approach. D
> 
> In our own development, we've been leaning heavily towards what you describe 
> in the document as "lazy with SRI". I believe this is consistent with what 
> the Hive community did on top of Orc. It's interesting because my (maybe 
> incorrect) understanding of the Databricks Delta approach is they chose what 
> you title "eager" in their approach to upserts. They may also have a lazy 
> approach for other types of mutations but I don't think they do.
> 
> Thanks again for putting this together!
> Jacques
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
> 
> On Wed, May 8, 2019 at 3:42 AM Anton Okolnychyi 
> <aokolnyc...@apple.com.invalid> wrote:
> Hi folks,
> 
> Miguel (cc) and I have spent some time thinking about how to perform 
> updates/deletes/upserts on top of Iceberg tables. This functionality is 
> essential for many modern use cases. We've summarized our ideas in a doc [1], 
> which, hopefully, will trigger a discussion in the community. The document 
> presents different conceptual approaches alongside their trade-offs. We will 
> be glad to consider any other ideas as well.
> 
> Thanks,
> Anton
> 
> [1] - 
> https://docs.google.com/document/d/1Pk34C3diOfVCRc-sfxfhXZfzvxwum1Odo-6Jj9mwK38/
>  
> <https://docs.google.com/document/d/1Pk34C3diOfVCRc-sfxfhXZfzvxwum1Odo-6Jj9mwK38/>
> 
> 

Reply via email to