Re: Spark Streaming - join streaming and static data
You do not need recent versions of spark, kafka, or structured streaming in order to do this. Normal DStreams are sufficient. You can parallelize your static data from the database to an RDD, and there's a join method available on RDDs. Transforming a single given timestamp line into multiple lines with modified timestamps can be done using flatMap. On Tue, Dec 6, 2016 at 11:11 AM, Burak Yavuzwrote: > Hi Daniela, > > This is trivial with Structured Streaming. If your Kafka cluster is 0.10.0 > or above, you may use Spark 2.0.2 to create a Streaming DataFrame from > Kafka, and then also create a DataFrame using the JDBC connection, and you > may join those. In Spark 2.1, there's support for a function called > "from_json", which should also help you easily parse your messages incoming > from Kafka. > > Best, > Burak > > On Tue, Dec 6, 2016 at 2:16 AM, Daniela S wrote: >> >> Hi >> >> I have some questions regarding Spark Streaming. >> >> I receive a stream of JSON messages from Kafka. >> The messages consist of a timestamp and an ID. >> >> timestamp ID >> 2016-12-06 13:001 >> 2016-12-06 13:405 >> ... >> >> In a database I have values for each ID: >> >> ID minute value >> 1 0 3 >> 1 1 5 >> 1 2 7 >> 1 3 8 >> 5 0 6 >> 5 1 6 >> 5 2 8 >> 5 3 5 >> 5 4 6 >> >> So I would like to join each incoming JSON message with the corresponding >> values. It should look as follows: >> >> timestamp ID minute value >> 2016-12-06 13:001 0 3 >> 2016-12-06 13:001 1 5 >> 2016-12-06 13:001 2 7 >> 2016-12-06 13:001 3 8 >> 2016-12-06 13:405 0 6 >> 2016-12-06 13:405 1 6 >> 2016-12-06 13:405 2 8 >> 2016-12-06 13:405 3 5 >> 2016-12-06 13:405 4 6 >> ... >> >> Then I would like to add the minute values to the timestamp. I only need >> the computed timestamp and the values. So the result should look as follows: >> >> timestamp value >> 2016-12-06 13:00 3 >> 2016-12-06 13:01 5 >> 2016-12-06 13:02 7 >> 2016-12-06 13:03 8 >> 2016-12-06 13:40 6 >> 2016-12-06 13:41 6 >> 2016-12-06 13:42 8 >> 2016-12-06 13:43 5 >> 2016-12-06 13:44 6 >> ... >> >> Is this a possible use case for Spark Streaming? I thought I could join >> the streaming data with the static data but I am not sure how to add the >> minute values to the timestamp. Is this possible with Spark Streaming? >> >> Thank you in advance. >> >> Best regards, >> Daniela >> >> - To >> unsubscribe e-mail: user-unsubscr...@spark.apache.org > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark Streaming - join streaming and static data
Hi Daniela, This is trivial with Structured Streaming. If your Kafka cluster is 0.10.0 or above, you may use Spark 2.0.2 to create a Streaming DataFrame from Kafka, and then also create a DataFrame using the JDBC connection, and you may join those. In Spark 2.1, there's support for a function called "from_json", which should also help you easily parse your messages incoming from Kafka. Best, Burak On Tue, Dec 6, 2016 at 2:16 AM, Daniela Swrote: > Hi > > I have some questions regarding Spark Streaming. > > I receive a stream of JSON messages from Kafka. > The messages consist of a timestamp and an ID. > > timestamp ID > 2016-12-06 13:001 > 2016-12-06 13:405 > ... > > In a database I have values for each ID: > > ID minute value > 1 0 3 > 1 1 5 > 1 2 7 > 1 3 8 > 5 0 6 > 5 1 6 > 5 2 8 > 5 3 5 > 5 4 6 > > So I would like to join each incoming JSON message with the corresponding > values. It should look as follows: > > timestamp ID minute value > 2016-12-06 13:001 0 3 > 2016-12-06 13:001 1 5 > 2016-12-06 13:001 2 7 > 2016-12-06 13:001 3 8 > 2016-12-06 13:405 0 6 > 2016-12-06 13:405 1 6 > 2016-12-06 13:405 2 8 > 2016-12-06 13:405 3 5 > 2016-12-06 13:405 4 6 > ... > > Then I would like to add the minute values to the timestamp. I only need > the computed timestamp and the values. So the result should look as follows: > > timestamp value > 2016-12-06 13:00 3 > 2016-12-06 13:01 5 > 2016-12-06 13:02 7 > 2016-12-06 13:03 8 > 2016-12-06 13:40 6 > 2016-12-06 13:41 6 > 2016-12-06 13:42 8 > 2016-12-06 13:43 5 > 2016-12-06 13:44 6 > ... > > Is this a possible use case for Spark Streaming? I thought I could join > the streaming data with the static data but I am not sure how to add the > minute values to the timestamp. Is this possible with Spark Streaming? > > Thank you in advance. > > Best regards, > Daniela > > - To > unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark Streaming - join streaming and static data
Hi I have some questions regarding Spark Streaming. I receive a stream of JSON messages from Kafka. The messages consist of a timestamp and an ID. timestamp ID 2016-12-06 13:00 1 2016-12-06 13:40 5 ... In a database I have values for each ID: ID minute value 1 0 3 1 1 5 1 2 7 1 3 8 5 0 6 5 1 6 5 2 8 5 3 5 5 4 6 So I would like to join each incoming JSON message with the corresponding values. It should look as follows: timestamp ID minute value 2016-12-06 13:00 1 0 3 2016-12-06 13:00 1 1 5 2016-12-06 13:00 1 2 7 2016-12-06 13:00 1 3 8 2016-12-06 13:40 5 0 6 2016-12-06 13:40 5 1 6 2016-12-06 13:40 5 2 8 2016-12-06 13:40 5 3 5 2016-12-06 13:40 5 4 6 ... Then I would like to add the minute values to the timestamp. I only need the computed timestamp and the values. So the result should look as follows: timestamp value 2016-12-06 13:00 3 2016-12-06 13:01 5 2016-12-06 13:02 7 2016-12-06 13:03 8 2016-12-06 13:40 6 2016-12-06 13:41 6 2016-12-06 13:42 8 2016-12-06 13:43 5 2016-12-06 13:44 6 ... Is this a possible use case for Spark Streaming? I thought I could join the streaming data with the static data but I am not sure how to add the minute values to the timestamp. Is this possible with Spark Streaming? Thank you in advance. Best regards, Daniela - To unsubscribe e-mail: user-unsubscr...@spark.apache.org