danielhumanmod commented on code in PR #569:
URL: https://github.com/apache/incubator-xtable/pull/569#discussion_r1839736983


##########
xtable-api/src/main/java/org/apache/xtable/spi/extractor/ExtractFromSource.java:
##########
@@ -47,9 +49,20 @@ public IncrementalTableChanges extractTableChanges(
         commitsBacklog.getCommitsToProcess().stream()
             .map(conversionSource::getTableChangeForCommit)

Review Comment:
   > I think that we'll want the identifier on the commit level, right?
   
   
   Thanks for the response @the-other-tim-brown. 
   
   Yes, ideally, every commit in source table can directly map to one in target 
table. However, based on my understanding of how XTable works, this isn’t 
guaranteed. Instead, the mapping (Source -> Target) is a N:1 mapping, which 
means:
   - Every commit in the target table has a corresponding mapping in the source 
table.
   - Not every commit in the source table has a one-to-one mapping in the 
target table.
   
   The reason is, between each sync(), there could be multiple changes on 
source, and all these changes will sync as only one commit in target, just like 
this example
   ```
   Iceberg (Source)          Delta (Target)  
   ┌────────────┐      ┌─────────────────────┐
   │ Snapshot 0 │ ◀  ▶ │ Version 0 (Synced)  │  
   │ Snapshot 1 │      │                     │  
   │ Snapshot 2 │      │                     │  
   │ Snapshot 3 │      │                     │  
   │ Snapshot 4 │      │                     │  
   │ Snapshot 5 │ ◀  ▶ │ Version 1 (Synced)  │
   └────────────┘      └─────────────────────┘  
   ```
   Given this, I’ve chosen to use the information from the latest commit in the 
source table as the source identifier.
   But my understanding might be wrong, please feel free to correct me if I’ve 
missed something :)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@xtable.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to