We did take a look at Hudi. The overall design seems to be pretty complicated and, unfortunately, I didn’t have time to explore every detail.
Here is my understanding (correct me if I am wrong): - Hudi has RECORD_KEY, which is expected to be unique. - Hudi has PRECOMBINED_KEY, which is used to pick only one row in the incoming batch if there are multiple rows with the same key. As I understand, this isn't used on reads. It is used on writes to deduplicate rows with identical keys within one incoming batch. For example, if we are inserting 10 records and two rows have the same key, PRECOMBINED_KEY will be used to pick up only one row. - Once Hudi ensures the uniqueness of RECORD_KEY within the incoming batch, it loads the Bloom filter index from all existing Parquet files in the involved partitions (meaning, partitions spread from the input batch) and tags each record as either an update or insert by mapping the incoming keys to existing files for updates. At this point, it seems to rely on join. Is my understanding correct? If so, do we want to consider joins on write? We mentioned this technique as one way to ensure the uniqueness of natural keys but we were concerned about the performance. Also, does Hudi support record-level updates? Thanks, Anton > On 10 May 2019, at 18:22, Erik Wright <erik.wri...@shopify.com.INVALID> wrote: > > Thanks for putting this forward. > > Another term for the "lazy" approach would be "merge on read". > > My team has built something internally that uses merge-on-read internally but > uses an "Eager" materialization for publication to Presto. Roughly, we > maintain a table metadata file that looks a bit like Iceberg's and tracks the > "live" version of each partition as it is updated over time. We are looking > into a solution that will allow us to push the merge-on-read all the way to > Presto (and other consumers), and adding Merge-On-Read to Iceberg is one of > the approaches we are considering. > > It's worth noting that Hudi does have support for upserts/deletes as well, so > that's another model to consider. > > On Fri, May 10, 2019 at 8:30 AM Miguel Miranda > <miguelnmira...@apple.com.invalid> wrote: > Hi, > > As Anton said, we purposely avoided making a "decision" on which approach > should be implemented in order to allow for a meaningful discussion with the > community. > > The document starts with an eager approach as it is straightforward and easy > to understand: steps resemble the usual file level operations/manipulations > frequently used by engineers when implementing Update/Delete/Upsert behaviour > themselves, hopefully creating a conceptual bridge to the more involved > designs. Right now, Iceberg has almost everything to implement the "eager" > approach as we simply need to adjust the retry mechanism. For example, I have > implemented a prototype of the eager solution with Spark and Iceberg. > > We looked into many existing solutions for inspiration, but when there isn't > a paper or code in the public domain it becomes hard to assess the underlying > design, although some of it can be inferred from the API or documentation. > > Best, > Miguel > >> On 10 May 2019, at 11:57, Anton Okolnychyi <aokolnyc...@apple.com >> <mailto:aokolnyc...@apple.com>> wrote: >> >> Thanks for the feedback, Jacques! >> >> You are correct, we kept the question of the best approach as open :) The >> idea was to have a discussion in the community. Hopefully, we can reach a >> consensus. >> >> While the proposed “lazy” approaches certainly offer significant benefits, >> they require more changes in Iceberg as well as in readers/query engines >> (depending on how we want to merge base and diff files). For us, it is >> important to understand whether the Iceberg community would even consider >> such changes. >> >> Hive ACID 3 is one the projects we looked at. In fact, we spoke to Owen, the >> original creator of updates/deletes/upserts in Hive. I believe the “lazy” >> approaches are close to what Hive 3 does but with their own distinctions >> that Iceberg allows us to have. It would be great to have Owen’s feedback. >> >> We don’t know the internals of Delta as updates/deletes/upserts are not open >> source. My personal guess, yes, it might be similar to the “eager” approach >> in our doc. >> >> Jacques, could you share some insights how you implement the merge of diffs? >> Is it done by readers? >> >> Thanks, >> Anton >> >>> On 10 May 2019, at 06:24, Jacques Nadeau <jacq...@dremio.com >>> <mailto:jacq...@dremio.com>> wrote: >>> >>> This is a nice doc and it covers many different options. Upon first skim, I >>> don't see a strong argument for particular approach. D >>> >>> In our own development, we've been leaning heavily towards what you >>> describe in the document as "lazy with SRI". I believe this is consistent >>> with what the Hive community did on top of Orc. It's interesting because my >>> (maybe incorrect) understanding of the Databricks Delta approach is they >>> chose what you title "eager" in their approach to upserts. They may also >>> have a lazy approach for other types of mutations but I don't think they do. >>> >>> Thanks again for putting this together! >>> Jacques >>> -- >>> Jacques Nadeau >>> CTO and Co-Founder, Dremio >>> >>> >>> On Wed, May 8, 2019 at 3:42 AM Anton Okolnychyi >>> <aokolnyc...@apple.com.invalid <mailto:aokolnyc...@apple.com.invalid>> >>> wrote: >>> Hi folks, >>> >>> Miguel (cc) and I have spent some time thinking about how to perform >>> updates/deletes/upserts on top of Iceberg tables. This functionality is >>> essential for many modern use cases. We've summarized our ideas in a doc >>> [1], which, hopefully, will trigger a discussion in the community. The >>> document presents different conceptual approaches alongside their >>> trade-offs. We will be glad to consider any other ideas as well. >>> >>> Thanks, >>> Anton >>> >>> [1] - >>> https://docs.google.com/document/d/1Pk34C3diOfVCRc-sfxfhXZfzvxwum1Odo-6Jj9mwK38/ >>> >>> <https://docs.google.com/document/d/1Pk34C3diOfVCRc-sfxfhXZfzvxwum1Odo-6Jj9mwK38/> >>> >>> >> >