kbendick commented on a change in pull request #3517:
URL: https://github.com/apache/iceberg/pull/3517#discussion_r749765488
##########
File path:
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java
##########
@@ -205,6 +205,11 @@ private boolean shouldProcess(Snapshot snapshot) {
"Cannot process delete snapshot : %s. Set read option %s to allow
skipping snapshots of type delete",
snapshot.snapshotId(),
SparkReadOptions.STREAMING_SKIP_DELETE_SNAPSHOTS);
return false;
+ case DataOperations.OVERWRITE:
Review comment:
+1. I would prefer they be two separate configs, but also that we have a
plan for the longer term to handle sending out these row deltas.
I'd be ok with getting a PR in to ignore `OVERWRITE`, but this isn't
something we should ignore in the longer term (or even really the
near-to-medium term) as others have mentioned.
Personally I would consider using a schema similar to the delta.io change
capture feed that has a dataframe with the before image / after image (row
before and after update) and then the type of operation for each row (insert,
delete, update_before, update_after).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]