[GitHub] [hudi] vinothchandar commented on issue #1839: Question, Add Support to Hudi datasets to spark structured streaming

GitBox Mon, 20 Jul 2020 10:15:04 -0700


vinothchandar commented on issue #1839:
URL: https://github.com/apache/hudi/issues/1839#issuecomment-661204597



   @rubenssoto yes. we already support incremental queries using the spark 
datasource. It seems like the only thing missing here is that you want the 
spark structured streaming integration? (which we can add after 0.6.0)
   https://hudi.apache.org/docs/querying_data.html#spark-incr-query
   
   https://www.youtube.com/watch?v=1w3IpavhSWA actually talks about a 
production use-case we build using an incremental query + some grouping on the 
sink side. Unlike delta, Hudi actually has record level metadata around arrival 
times and thus does not need anything like ignoreChanges. 
   
   I am not sure if I am missing something around your use-case, but feels like 
you should be able to get this working incrementally end-end with what we have 
today (again, we can add spark streaming read support.. if there are hands to 
help.. cc @garyli1019? :)) 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] vinothchandar commented on issue #1839: Question, Add Support to Hudi datasets to spark structured streaming

Reply via email to