rdblue commented on code in PR #4870:
URL: https://github.com/apache/iceberg/pull/4870#discussion_r912531270


##########
api/src/main/java/org/apache/iceberg/DeletedRowsScanTask.java:
##########
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+/**
+ * A scan task for deleted data records generated by adding delete files to 
the table.
+ */
+public interface DeletedRowsScanTask extends ChangelogScanTask {

Review Comment:
   I think either interpretation of the concurrent delete case is fine. For the 
position delete example, d1 and d2 are concurrent and based on the same 
underlying data, so I'm fine saying BOTH of them deleted a row. It is a little 
strange to say d1 deleted pos 0 and d2 didn't because d2 actually did encode 
that delete and would have deleted row 0 if it had won the race to commit. But 
there is also a strong argument that we should produce a delete for each row 
once to avoid confusion when people consume these tables.
   
   I think as long as the implementation documents what it does, either one is 
fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to