Many thanks Silvio for the link. That’s exactly what I’m looking for. ☺
However there is no mentioning of checkpoint support for custom “ForeachWriter” 
in structured streaming. I’m going to test that now.

Good question Gary, this is the mentioning in the 
link<https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html>.
Often times we want to be able to write output of streams to external databases 
such as MySQL. At the time of writing, the Structured Streaming API does not 
support external databases as sinks; however, when it does, the API option will 
be as simple as .format("jdbc").start("jdbc:mysql/..").
In the meantime, we can use the foreach sink to accomplish this. Let’s create a 
custom JDBC Sink that extends ForeachWriter and implements its methods.

I’m not sure though if jdbc sink feature will be available in upcoming spark 
(2.2.0?) version or not.
It would good to know if someone has information about it.

Thanks,
Hemanth

From: "lucas.g...@gmail.com" <lucas.g...@gmail.com>
Date: Monday, 10 April 2017 at 8.24
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: Re: Does spark 2.1.0 structured streaming support jdbc sink?

Interesting, does anyone know if we'll be seeing the JDBC sinks in upcoming 
releases?

Thanks!

Gary Lucas

On 9 April 2017 at 13:52, Silvio Fiorito 
<silvio.fior...@granturing.com<mailto:silvio.fior...@granturing.com>> wrote:
JDBC sink is not in 2.1. You can see here for an example implementation using 
the ForEachWriter sink instead: 
https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html


From: Hemanth Gudela 
<hemanth.gud...@qvantel.com<mailto:hemanth.gud...@qvantel.com>>
Date: Sunday, April 9, 2017 at 4:30 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Does spark 2.1.0 structured streaming support jdbc sink?

Hello Everyone,
                I am new to Spark, especially spark streaming.

I am trying to read an input stream from Kafka, perform windowed aggregations 
in spark using structured streaming, and finally write aggregates to a sink.

-          MySQL as an output sink doesn’t seem to be an option, because this 
block of code throws an error

streamingDF.writeStream.format("jdbc").start("jdbc:mysql…”)

ava.lang.UnsupportedOperationException: Data source jdbc does not support 
streamed writing

This is strange because, 
this<http://rxin.github.io/talks/2016-02-18_spark_summit_streaming.pdf> 
document shows that jdbc is supported as an output sink!



-          Parquet doesn’t seem to be an option, because it doesn’t support 
“complete” output mode, but “append” only. As I’m preforming windows 
aggregations in spark streaming, the output mode has to be complete, and cannot 
be “append”


-          Memory and console sinks are good for debugging, but are not 
suitable for production jobs.

So, please correct me if I’m missing something in my code to enable jdbc output 
sink.
If jdbc output sink is not option, please suggest me an alternative output sink 
that suits my needs better.

Or since structured streaming is still ‘alpha’, should I resort to spark 
dstreams to achieve my use case described above.
Please suggest.

Thanks in advance,
Hemanth

Reply via email to