[GitHub] [iceberg] HeartSaVioR commented on pull request #796: Support Spark Structured Streaming Read for Iceberg

GitBox Fri, 14 Aug 2020 04:12:48 -0700


HeartSaVioR commented on pull request #796:
URL: https://github.com/apache/iceberg/pull/796#issuecomment-674023600



   I'm a fan of this feature, and I'd like to see this finally done sooner even 
everything is not perfect. I'm planning to do some functional test once it's 
merged. I'd also like to help if there're minor things to handle which can be 
done as follow-up.
   
   This would be the major feature to cover the gap on use case for structured 
streaming between Delta Lake and Iceberg. There's a technical limitation on 
Spark structured streaming itself (global watermark), which requires workaround 
via splitting query into multiple queries & intermediate storage supporting 
end-to-end exactly once. Delta Lake covers the case, and I really would like to 
see the case also covered by Iceberg.
   
   I see there're lots of works in progress on the milestone (and these are 
great features which should be done), but after this we cover both batch and 
streaming workloads being done with Spark, which is a huge step forward on 
Spark users.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] HeartSaVioR commented on pull request #796: Support Spark Structured Streaming Read for Iceberg

Reply via email to