kbendick commented on a change in pull request #3267:
URL: https://github.com/apache/iceberg/pull/3267#discussion_r725461003
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java
##########
@@ -212,12 +212,22 @@ public void stop() {
private boolean shouldProcess(Snapshot snapshot) {
String op = snapshot.operation();
- Preconditions.checkState(!op.equals(DataOperations.DELETE) || skipDelete,
- "Cannot process delete snapshot: %s", snapshot.snapshotId());
- Preconditions.checkState(
- op.equals(DataOperations.DELETE) || op.equals(DataOperations.APPEND)
|| op.equals(DataOperations.REPLACE),
- "Cannot process %s snapshot: %s", op.toLowerCase(Locale.ROOT),
snapshot.snapshotId());
Review comment:
Currently, the code handles `skipDeletes` via the first
Preconditions.checkState assertion. Then, if that passed, it checks again that
the operation is one of the insert only operations or delete again (as
presumably if the snapshot operation were of type `DataOperations.DELETE`, the
flag `skipDelete` would have been set to true per the first assertion).
This was a little confusing for me at first seeing it checked twice, once in
the negative and once in the positive.
The second assertion here also arguably serves as a way of detecting
potentially unhandled `DataOperation` types. Given that, I chose to use a
switch statement to be a bit more explicit about what is and is not allowed, as
well as having the benefit of a default case to catch any unhandled
`DataOperations` (should they be added later for example).
If we want to go back to using `Preconditions`, I'm more than happy to do
that. As this code is arguably in the hot path for the driver during
scheduling, we could collapse the two calls into one and change the ordering so
that string comparisons are reduced (by checking for `skipDelete` first and
then checking for DataOperations.APPEND, which should be the most common case).
Let me know if anybody has issue with the switch statement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]