You do not need recent versions of spark, kafka, or structured
streaming in order to do this.  Normal DStreams are sufficient.

You can parallelize your static data from the database to an RDD, and
there's a join method available on RDDs.  Transforming a single given
timestamp line into multiple lines with modified timestamps can be
done using flatMap.

On Tue, Dec 6, 2016 at 11:11 AM, Burak Yavuz <brk...@gmail.com> wrote:
> Hi Daniela,
>
> This is trivial with Structured Streaming. If your Kafka cluster is 0.10.0
> or above, you may use Spark 2.0.2 to create a Streaming DataFrame from
> Kafka, and then also create a DataFrame using the JDBC connection, and you
> may join those. In Spark 2.1, there's support for a function called
> "from_json", which should also help you easily parse your messages incoming
> from Kafka.
>
> Best,
> Burak
>
> On Tue, Dec 6, 2016 at 2:16 AM, Daniela S <daniela_4...@gmx.at> wrote:
>>
>> Hi
>>
>> I have some questions regarding Spark Streaming.
>>
>> I receive a stream of JSON messages from Kafka.
>> The messages consist of a timestamp and an ID.
>>
>> timestamp                 ID
>> 2016-12-06 13:00    1
>> 2016-12-06 13:40    5
>> ...
>>
>> In a database I have values for each ID:
>>
>> ID       minute      value
>> 1         0               3
>> 1         1               5
>> 1         2               7
>> 1         3               8
>> 5         0               6
>> 5         1               6
>> 5         2               8
>> 5         3               5
>> 5         4               6
>>
>> So I would like to join each incoming JSON message with the corresponding
>> values. It should look as follows:
>>
>> timestamp                 ID           minute      value
>> 2016-12-06 13:00    1             0               3
>> 2016-12-06 13:00    1             1               5
>> 2016-12-06 13:00    1             2               7
>> 2016-12-06 13:00    1             3               8
>> 2016-12-06 13:40    5             0               6
>> 2016-12-06 13:40    5             1               6
>> 2016-12-06 13:40    5             2               8
>> 2016-12-06 13:40    5             3               5
>> 2016-12-06 13:40    5             4               6
>> ...
>>
>> Then I would like to add the minute values to the timestamp. I only need
>> the computed timestamp and the values. So the result should look as follows:
>>
>> timestamp                   value
>> 2016-12-06 13:00      3
>> 2016-12-06 13:01      5
>> 2016-12-06 13:02      7
>> 2016-12-06 13:03      8
>> 2016-12-06 13:40      6
>> 2016-12-06 13:41      6
>> 2016-12-06 13:42      8
>> 2016-12-06 13:43      5
>> 2016-12-06 13:44      6
>> ...
>>
>> Is this a possible use case for Spark Streaming? I thought I could join
>> the streaming data with the static data but I am not sure how to add the
>> minute values to the timestamp. Is this possible with Spark Streaming?
>>
>> Thank you in advance.
>>
>> Best regards,
>> Daniela
>>
>> --------------------------------------------------------------------- To
>> unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to