kbendick commented on pull request #3268: URL: https://github.com/apache/iceberg/pull/3268#issuecomment-939352969
There are a few corner cases that don't appear to be handled. For example if somebody runs a `CreateOrReplaceTable` operation on an existing table and somebody streams from the beginning it will either NPE or likely return to the beginning of the table entirely. But given that the Spark streaming source is presently only able to handle a limited set of use cases, I don't think convoluting it for edge cases is the best idea. If anybody would like to discuss my findings, I'd be happy to do so as I spent some time looking into CDC in other systems and how it can be efficiently applied for the Spark stream (which doesn't have an in-built notion of CDC or deletes in general). There's a bit more of a break down on the other PR mentioned above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
