HeartSaVioR commented on pull request #796: URL: https://github.com/apache/iceberg/pull/796#issuecomment-674023600
I'm a fan of this feature, and I'd like to see this finally done sooner even everything is not perfect. I'm planning to do some functional test once it's merged. I'd also like to help if there're minor things to handle which can be done as follow-up. This would be the major feature to cover the gap on use case for structured streaming between Delta Lake and Iceberg. There's a technical limitation on Spark structured streaming itself (global watermark), which requires workaround via splitting query into multiple queries & intermediate storage supporting end-to-end exactly once. Delta Lake covers the case, and I really would like to see the case also covered by Iceberg. I see there're lots of works in progress on the milestone (and these are great features which should be done), but after this we cover both batch and streaming workloads being done with Spark, which is a huge step forward on Spark users. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
