rdblue commented on a change in pull request #3834:
URL: https://github.com/apache/iceberg/pull/3834#discussion_r805417317



##########
File path: core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java
##########
@@ -120,12 +120,7 @@ public void write(T row) throws IOException {
       // Create a copied key from this row.
       StructLike copiedKey = 
StructCopy.copy(structProjection.wrap(asStructLike(row)));
 
-      // Adding a pos-delete to replace the old path-offset.
-      PathOffset previous = insertedRowMap.put(copiedKey, pathOffset);
-      if (previous != null) {
-        // TODO attach the previous row if has a positional-delete row schema 
in appender factory.
-        posDeleteWriter.delete(previous.path, previous.rowOffset, null);
-      }
+      insertedRowMap.put(copiedKey, pathOffset);

Review comment:
       Let's remove this change.
   
   The logic here is to opportunistically catch duplicates when there are only 
inserts. This is not intended to replace the real upsert logic, which requires 
calling delete as you noted. Instead, it is here because we're updating the 
`insertedRowMap` and may get a previous insert location. When that happens, the 
right thing to do is to delete the duplicate row.
   
   I also just realized that the changes below are incorrect. Instead of 
calling `internalPosDelete(key)`, this checks the `insertedRowMap` itself using 
`get`. The logic in internalPosDelete` used `remove` so that the entry was 
removed _before_ the insert occurred and we have a second opportunistic check.
   
   To fix this, you should instead update `internalPosDelete` to return `true` 
if a row was deleted and `false` otherwise. Then you can update your `previous 
!= null` check like this:
   
   ```java
       public void delete(T row) throws IOException {
         if (!internalPosDelete(structProjection.wrap(asStructLike(row)))) {
           eqDeleteWriter.write(row);
         }
       }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to