[GitHub] [iceberg] wypoon commented on a diff in pull request #4588: Spark: Add custom metric for number of deletes applied by a SparkScan

GitBox Fri, 19 Aug 2022 17:37:52 -0700


wypoon commented on code in PR #4588:
URL: https://github.com/apache/iceberg/pull/4588#discussion_r950623070



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java:
##########
@@ -183,6 +187,8 @@ Pair<int[], Integer> 
buildPosDelRowIdMapping(PositionDeleteIndex deletedRowPosit
           currentRowId++;
         } else if (hasIsDeletedColumn) {
           isDeleted[originalRowId] = true;
+        } else {
+          deletes.incrementDeleteCount();
         }

Review Comment:
   So, following the explanation by @RussellSpitzer: 
   
   > there is just now a second read path with _is_deleted metadata column 
which actually returns deleted rows. When we follow that path I'm not sure it's 
important for us to count the deleted rows since we'll be returning them 
anyway, I could go either way. 
   
   I decided that for this read path with the _id_deleted metadata column, we 
will not perform the count. That is what has been implemented here.
   
   Pardon my ignorance: for my benefit, how does one perform such a read (where 
`hasIsDeletedColumn` is true)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] wypoon commented on a diff in pull request #4588: Spark: Add custom metric for number of deletes applied by a SparkScan

Reply via email to