ashvina commented on code in PR #596:
URL: https://github.com/apache/incubator-xtable/pull/596#discussion_r1881172366


##########
xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionSource.java:
##########
@@ -112,19 +119,47 @@ public TableChange getTableChangeForCommit(Long 
versionNumber) {
                 tableAtVersion.getReadSchema().getFields(),
                 true,
                 DeltaPartitionExtractor.getInstance(),
-                DeltaStatsExtractor.getInstance()));
+                DeltaStatsExtractor.getInstance());
+        addedFiles.put(dataFile.getPhysicalPath(), dataFile);
+        String deleteVectorPath =
+            actionsConverter.extractDeletionVectorFile(snapshotAtVersion, 
(AddFile) action);
+        if (deleteVectorPath != null) {
+          deletionVectors.add(deleteVectorPath);
+        }
       } else if (action instanceof RemoveFile) {
-        removedFiles.add(
+        InternalDataFile dataFile =
             actionsConverter.convertRemoveActionToInternalDataFile(
                 (RemoveFile) action,
                 snapshotAtVersion,
                 fileFormat,
                 tableAtVersion.getPartitioningFields(),
-                DeltaPartitionExtractor.getInstance()));
+                DeltaPartitionExtractor.getInstance());
+        removedFiles.put(dataFile.getPhysicalPath(), dataFile);
+      }
+    }
+
+    // In Delta Lake if delete vector information is added for an existing 
data file, as a result of
+    // a delete operation, then a new RemoveFile action is added to the commit 
log to remove the old
+    // entry which is replaced by a new entry, AddFile with delete vector 
information. Since the
+    // same data file is removed and added, we need to remove it from the 
added and removed file
+    // maps which are used to track actual added and removed data files.
+    for (String deletionVector : deletionVectors) {
+      // validate that a Remove action is also added for the data file
+      if (removedFiles.containsKey(deletionVector)) {
+        addedFiles.remove(deletionVector);
+        removedFiles.remove(deletionVector);
+      } else {
+        log.warn(

Review Comment:
   I don't think XTable should terminate translation if a `RemoveFile` action 
corresponding to a `AddFile+DeleteVector` action is not present. Mainly because 
I'm not certain if adding a file for the first time along with an associated 
delete vector violates the Delta spec. I believe the Delta API would allow it, 
but I need to validate this. It may be allowed to support use cases like a 
long-running transaction which first adds a file and then remove some entries 
from it, but I'm not sure. If the spec allows it, XTable should not terminate.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to