rdblue commented on a change in pull request #3834:
URL: https://github.com/apache/iceberg/pull/3834#discussion_r805417317
##########
File path: core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java
##########
@@ -120,12 +120,7 @@ public void write(T row) throws IOException {
// Create a copied key from this row.
StructLike copiedKey =
StructCopy.copy(structProjection.wrap(asStructLike(row)));
- // Adding a pos-delete to replace the old path-offset.
- PathOffset previous = insertedRowMap.put(copiedKey, pathOffset);
- if (previous != null) {
- // TODO attach the previous row if has a positional-delete row schema
in appender factory.
- posDeleteWriter.delete(previous.path, previous.rowOffset, null);
- }
+ insertedRowMap.put(copiedKey, pathOffset);
Review comment:
Let's remove this change.
The logic here is to opportunistically catch duplicates when there are only
inserts. This is not intended to replace the real upsert logic, which requires
calling delete as you noted. Instead, it is here because we're updating the
`insertedRowMap` and may get a previous insert location. When that happens, the
right thing to do is to delete the duplicate row.
I also just realized that the changes below are incorrect. Instead of
calling `internalPosDelete(key)`, this checks the `insertedRowMap` itself using
`get`. The logic in internalPosDelete` used `remove` so that the entry was
removed _before_ the insert occurred and we have a second opportunistic check.
To fix this, you should instead update `internalPosDelete` to return `true`
if a row was deleted and `false` otherwise. Then you can update your `previous
!= null` check like this:
```java
public void delete(T row) throws IOException {
if (!internalPosDelete(structProjection.wrap(asStructLike(row)))) {
eqDeleteWriter.write(row);
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]