danielhumanmod commented on code in PR #569: URL: https://github.com/apache/incubator-xtable/pull/569#discussion_r1839736983
########## xtable-api/src/main/java/org/apache/xtable/spi/extractor/ExtractFromSource.java: ########## @@ -47,9 +49,20 @@ public IncrementalTableChanges extractTableChanges( commitsBacklog.getCommitsToProcess().stream() .map(conversionSource::getTableChangeForCommit) Review Comment: > I think that we'll want the identifier on the commit level, right? Thanks for the response @the-other-tim-brown. Yes, ideally, every commit in source table should directly map to one in target table. However, based on my understanding of how XTable works, this isn’t guaranteed. Instead, the mapping (Source -> Target) is more like a N:1 mapping, which means: - Every commit in the target table has a corresponding mapping in the source table. - Not every commit in the source table has a one-to-one mapping in the target table. The reason is, between each sync(), there could be multiple changes on source, and all these changes will sync as only one commit in target, just like this example ``` Iceberg (Source) Delta (Target) ┌────────────┐ ┌─────────────────────┐ │ Snapshot 0 │ ◀ ▶ │ Version 0 (Synced) │ (can map to snapshot 0) │ Snapshot 1 │ │ │ │ Snapshot 2 │ │ │ │ Snapshot 3 │ │ │ │ Snapshot 4 │ │ │ │ Snapshot 5 │ ◀ ▶ │ Version 1 (Synced) │ (can map to snapshot 5) └────────────┘ └─────────────────────┘ ``` Given this, I’ve chosen to use the information from the latest commit in the source table as the source identifier. But my understanding might be wrong, appreciate if there is any feedback or suggestion :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@xtable.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org