hameizi commented on a change in pull request #3834:
URL: https://github.com/apache/iceberg/pull/3834#discussion_r803328630
##########
File path: core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java
##########
@@ -120,12 +120,7 @@ public void write(T row) throws IOException {
// Create a copied key from this row.
StructLike copiedKey =
StructCopy.copy(structProjection.wrap(asStructLike(row)));
- // Adding a pos-delete to replace the old path-offset.
- PathOffset previous = insertedRowMap.put(copiedKey, pathOffset);
- if (previous != null) {
- // TODO attach the previous row if has a positional-delete row schema
in appender factory.
- posDeleteWriter.delete(previous.path, previous.rowOffset, null);
- }
+ insertedRowMap.put(copiedKey, pathOffset);
Review comment:
@rdblue After test i think this is necessary, and i think old logic is
error. Because old logic only write pos-delete in `write` function what should
happen in `delete` function. And below test case is also puzzle:
https://github.com/apache/iceberg/blob/9b6b5e0d2e760694e2abef73fa9036d1d8bbd014/data/src/test/java/org/apache/iceberg/io/TestTaskEqualityDeltaWriter.java#L324
this test write duplicate key return just one line but user should avoid write
duplicate key depend on config `write.upsert.enable`
https://github.com/apache/iceberg/pull/2863 to execute delete semantic but not
write sementic(this is error sementic) when there is duplicate inserts.
So we just need record the postion of key in insertedRowMap when execute
`write` function then write pos-delete file when execute `delete` function.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]