danielhumanmod commented on code in PR #569: URL: https://github.com/apache/incubator-xtable/pull/569#discussion_r1839736983
########## xtable-api/src/main/java/org/apache/xtable/spi/extractor/ExtractFromSource.java: ########## @@ -47,9 +49,20 @@ public IncrementalTableChanges extractTableChanges( commitsBacklog.getCommitsToProcess().stream() .map(conversionSource::getTableChangeForCommit) Review Comment: > I think that we'll want the identifier on the commit level, right? Thanks for the response @the-other-tim-brown. Yes, ideally, every commit in source table can directly map to one in target table. However, based on my understanding of how XTable works, this isn’t guaranteed. Instead, the mapping (Source -> Target) is a N:1 mapping, which means: - Every commit in the target table has a corresponding mapping in the source table. - Not every commit in the source table has a one-to-one mapping in the target table. The reason is, between each sync(), there could be multiple changes on source, and all these changes will sync as only one commit in target, just like this example ``` Iceberg (Source) Delta (Target) ┌────────────┐ ┌─────────────────────┐ │ Snapshot 0 │ ◀ ▶ │ Version 0 (Synced) │ │ Snapshot 1 │ │ │ │ Snapshot 2 │ │ │ │ Snapshot 3 │ │ │ │ Snapshot 4 │ │ │ │ Snapshot 5 │ ◀ ▶ │ Version 1 (Synced) │ └────────────┘ └─────────────────────┘ ``` Given this, I’ve chosen to use the information from the latest commit in the source table as the source identifier. But my understanding might be wrong, please feel free to correct me if I’ve missed something :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@xtable.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org