Many thanks Silvio for the link. That’s exactly what I’m looking for. ☺ However there is no mentioning of checkpoint support for custom “ForeachWriter” in structured streaming. I’m going to test that now.
Good question Gary, this is the mentioning in the link<https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html>. Often times we want to be able to write output of streams to external databases such as MySQL. At the time of writing, the Structured Streaming API does not support external databases as sinks; however, when it does, the API option will be as simple as .format("jdbc").start("jdbc:mysql/.."). In the meantime, we can use the foreach sink to accomplish this. Let’s create a custom JDBC Sink that extends ForeachWriter and implements its methods. I’m not sure though if jdbc sink feature will be available in upcoming spark (2.2.0?) version or not. It would good to know if someone has information about it. Thanks, Hemanth From: "lucas.g...@gmail.com" <lucas.g...@gmail.com> Date: Monday, 10 April 2017 at 8.24 To: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: Does spark 2.1.0 structured streaming support jdbc sink? Interesting, does anyone know if we'll be seeing the JDBC sinks in upcoming releases? Thanks! Gary Lucas On 9 April 2017 at 13:52, Silvio Fiorito <silvio.fior...@granturing.com<mailto:silvio.fior...@granturing.com>> wrote: JDBC sink is not in 2.1. You can see here for an example implementation using the ForEachWriter sink instead: https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html From: Hemanth Gudela <hemanth.gud...@qvantel.com<mailto:hemanth.gud...@qvantel.com>> Date: Sunday, April 9, 2017 at 4:30 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Does spark 2.1.0 structured streaming support jdbc sink? Hello Everyone, I am new to Spark, especially spark streaming. I am trying to read an input stream from Kafka, perform windowed aggregations in spark using structured streaming, and finally write aggregates to a sink. - MySQL as an output sink doesn’t seem to be an option, because this block of code throws an error streamingDF.writeStream.format("jdbc").start("jdbc:mysql…”) ava.lang.UnsupportedOperationException: Data source jdbc does not support streamed writing This is strange because, this<http://rxin.github.io/talks/2016-02-18_spark_summit_streaming.pdf> document shows that jdbc is supported as an output sink! - Parquet doesn’t seem to be an option, because it doesn’t support “complete” output mode, but “append” only. As I’m preforming windows aggregations in spark streaming, the output mode has to be complete, and cannot be “append” - Memory and console sinks are good for debugging, but are not suitable for production jobs. So, please correct me if I’m missing something in my code to enable jdbc output sink. If jdbc output sink is not option, please suggest me an alternative output sink that suits my needs better. Or since structured streaming is still ‘alpha’, should I resort to spark dstreams to achieve my use case described above. Please suggest. Thanks in advance, Hemanth