Re: Spark Streaming - join streaming and static data

2016-12-06 Thread Cody Koeninger
You do not need recent versions of spark, kafka, or structured
streaming in order to do this.  Normal DStreams are sufficient.

You can parallelize your static data from the database to an RDD, and
there's a join method available on RDDs.  Transforming a single given
timestamp line into multiple lines with modified timestamps can be
done using flatMap.

On Tue, Dec 6, 2016 at 11:11 AM, Burak Yavuz  wrote:
> Hi Daniela,
>
> This is trivial with Structured Streaming. If your Kafka cluster is 0.10.0
> or above, you may use Spark 2.0.2 to create a Streaming DataFrame from
> Kafka, and then also create a DataFrame using the JDBC connection, and you
> may join those. In Spark 2.1, there's support for a function called
> "from_json", which should also help you easily parse your messages incoming
> from Kafka.
>
> Best,
> Burak
>
> On Tue, Dec 6, 2016 at 2:16 AM, Daniela S  wrote:
>>
>> Hi
>>
>> I have some questions regarding Spark Streaming.
>>
>> I receive a stream of JSON messages from Kafka.
>> The messages consist of a timestamp and an ID.
>>
>> timestamp ID
>> 2016-12-06 13:001
>> 2016-12-06 13:405
>> ...
>>
>> In a database I have values for each ID:
>>
>> ID   minute  value
>> 1 0   3
>> 1 1   5
>> 1 2   7
>> 1 3   8
>> 5 0   6
>> 5 1   6
>> 5 2   8
>> 5 3   5
>> 5 4   6
>>
>> So I would like to join each incoming JSON message with the corresponding
>> values. It should look as follows:
>>
>> timestamp ID   minute  value
>> 2016-12-06 13:001 0   3
>> 2016-12-06 13:001 1   5
>> 2016-12-06 13:001 2   7
>> 2016-12-06 13:001 3   8
>> 2016-12-06 13:405 0   6
>> 2016-12-06 13:405 1   6
>> 2016-12-06 13:405 2   8
>> 2016-12-06 13:405 3   5
>> 2016-12-06 13:405 4   6
>> ...
>>
>> Then I would like to add the minute values to the timestamp. I only need
>> the computed timestamp and the values. So the result should look as follows:
>>
>> timestamp   value
>> 2016-12-06 13:00  3
>> 2016-12-06 13:01  5
>> 2016-12-06 13:02  7
>> 2016-12-06 13:03  8
>> 2016-12-06 13:40  6
>> 2016-12-06 13:41  6
>> 2016-12-06 13:42  8
>> 2016-12-06 13:43  5
>> 2016-12-06 13:44  6
>> ...
>>
>> Is this a possible use case for Spark Streaming? I thought I could join
>> the streaming data with the static data but I am not sure how to add the
>> minute values to the timestamp. Is this possible with Spark Streaming?
>>
>> Thank you in advance.
>>
>> Best regards,
>> Daniela
>>
>> - To
>> unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark Streaming - join streaming and static data

2016-12-06 Thread Burak Yavuz
Hi Daniela,

This is trivial with Structured Streaming. If your Kafka cluster is 0.10.0
or above, you may use Spark 2.0.2 to create a Streaming DataFrame from
Kafka, and then also create a DataFrame using the JDBC connection, and you
may join those. In Spark 2.1, there's support for a function called
"from_json", which should also help you easily parse your messages incoming
from Kafka.

Best,
Burak

On Tue, Dec 6, 2016 at 2:16 AM, Daniela S  wrote:

> Hi
>
> I have some questions regarding Spark Streaming.
>
> I receive a stream of JSON messages from Kafka.
> The messages consist of a timestamp and an ID.
>
> timestamp ID
> 2016-12-06 13:001
> 2016-12-06 13:405
> ...
>
> In a database I have values for each ID:
>
> ID   minute  value
> 1 0   3
> 1 1   5
> 1 2   7
> 1 3   8
> 5 0   6
> 5 1   6
> 5 2   8
> 5 3   5
> 5 4   6
>
> So I would like to join each incoming JSON message with the corresponding
> values. It should look as follows:
>
> timestamp ID   minute  value
> 2016-12-06 13:001 0   3
> 2016-12-06 13:001 1   5
> 2016-12-06 13:001 2   7
> 2016-12-06 13:001 3   8
> 2016-12-06 13:405 0   6
> 2016-12-06 13:405 1   6
> 2016-12-06 13:405 2   8
> 2016-12-06 13:405 3   5
> 2016-12-06 13:405 4   6
> ...
>
> Then I would like to add the minute values to the timestamp. I only need
> the computed timestamp and the values. So the result should look as follows:
>
> timestamp   value
> 2016-12-06 13:00  3
> 2016-12-06 13:01  5
> 2016-12-06 13:02  7
> 2016-12-06 13:03  8
> 2016-12-06 13:40  6
> 2016-12-06 13:41  6
> 2016-12-06 13:42  8
> 2016-12-06 13:43  5
> 2016-12-06 13:44  6
> ...
>
> Is this a possible use case for Spark Streaming? I thought I could join
> the streaming data with the static data but I am not sure how to add the
> minute values to the timestamp. Is this possible with Spark Streaming?
>
> Thank you in advance.
>
> Best regards,
> Daniela
>
> - To
> unsubscribe e-mail: user-unsubscr...@spark.apache.org


Spark Streaming - join streaming and static data

2016-12-06 Thread Daniela S
Hi

 

I have some questions regarding Spark Streaming.

 

I receive a stream of JSON messages from Kafka.

The messages consist of a timestamp and an ID.

 

timestamp                 ID

2016-12-06 13:00    1

2016-12-06 13:40    5

...

 

In a database I have values for each ID:

 

ID       minute      value

1         0               3

1         1               5

1         2               7

1         3               8

5         0               6

5         1               6

5         2               8

5         3               5

5         4               6

 

So I would like to join each incoming JSON message with the corresponding values. It should look as follows:

 

timestamp                 ID           minute      value


2016-12-06 13:00    1             0               3

2016-12-06 13:00    1             1               5          

2016-12-06 13:00    1             2               7

2016-12-06 13:00    1             3               8

2016-12-06 13:40    5             0               6

2016-12-06 13:40    5             1               6

2016-12-06 13:40    5             2               8

2016-12-06 13:40    5             3               5

2016-12-06 13:40    5             4               6

...

 

Then I would like to add the minute values to the timestamp. I only need the computed timestamp and the values. So the result should look as follows:

 


timestamp                   value


2016-12-06 13:00      3

2016-12-06 13:01      5          

2016-12-06 13:02      7

2016-12-06 13:03      8


2016-12-06 13:40      6

2016-12-06 13:41      6

2016-12-06 13:42      8

2016-12-06 13:43      5

2016-12-06 13:44      6


...



 

Is this a possible use case for Spark Streaming? I thought I could join the streaming data with the static data but I am not sure how to add the minute values to the timestamp. Is this possible with Spark Streaming?

 

Thank you in advance.

 

Best regards,

Daniela

 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org