[ 
https://issues.apache.org/jira/browse/HUDI-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249238#comment-17249238
 ] 

Ryan Murray commented on HUDI-1455:
-----------------------------------

The catalog interface in Iceberg[1] and the LogStore interface in Delta [3] 
both abstract away the file operations to commit a transaction. Typically for 
filesystems with atomic rename (eg hdfs) this just delegates to hdfs libraries. 
For S3 Iceberg delegates the locking to Hive and indications are that 
proprietary Delta delegates to an internal Databricks api (guessing from the 
code in the oss repo and the docs). Nessie fits into iceberg [2] and delta [4] 
by implementing those interfaces and performing the (optimistic) locking 
through nessie. As it hooks in at this layer it is used both as the locking 
mechanism (which is what allows for many simultaneous readers and writers) and 
is able to capture the required info to maintain the git-like history of 
branches and tags.

My (admittedly not extensive) research into Hudi looks like there is indeed no 
layer for those types of operations and everything is handled in by the IO 
itself. I am not sure how easy it is to slide something like Nessie in at that 
level or if it requires something like implementing a hadoop filesystem 
interface. What do you think? 

[1] http://iceberg.apache.org/custom-catalog/
[2] 
https://github.com/apache/iceberg/tree/master/nessie/src/main/java/org/apache/iceberg/nessie
[3] 
https://github.com/delta-io/delta/blob/master/src/main/scala/org/apache/spark/sql/delta/storage/LogStore.scala
[4] 
https://github.com/projectnessie/nessie/blob/main/clients/deltalake/core/src/main/scala/com/dremio/nessie/deltalake/NessieLogStore.scala

> Hudi integration with project nessie
> ------------------------------------
>
>                 Key: HUDI-1455
>                 URL: https://issues.apache.org/jira/browse/HUDI-1455
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Vinoth Chandar
>            Priority: Major
>
> [https://github.com/apache/hudi/issues/2330#issuecomment-743423398] 
> Follow up from this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to