[GitHub] [iceberg] kbendick commented on pull request #3268: [SPARK] Simplify shouldProcess check in Spark3 streaming source

GitBox Sat, 09 Oct 2021 12:49:43 -0700


kbendick commented on pull request #3268:
URL: https://github.com/apache/iceberg/pull/3268#issuecomment-939352969



   There are a few corner cases that don't appear to be handled. For example if 
somebody runs a `CreateOrReplaceTable` operation on an existing table and 
somebody streams from the beginning it will either NPE or likely return to the 
beginning of the table entirely.
   
   But given that the Spark streaming source is presently only able to handle a 
limited set of use cases, I don't think convoluting it for edge cases is the 
best idea. If anybody would like to discuss my findings, I'd be happy to do so 
as I spent some time looking into CDC in other systems and how it can be 
efficiently applied for the Spark stream (which doesn't have an in-built notion 
of CDC or deletes in general). There's a bit more of a break down on the other 
PR mentioned above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on pull request #3268: [SPARK] Simplify shouldProcess check in Spark3 streaming source

Reply via email to