Re: Spark structured streaming: periodically refresh static data frame

naresh Goud Sun, 25 Feb 2018 15:28:14 -0800

Appu,

I am also landed in same problem.


Are you able to solve this issue? Could you please share snippet of code if
your able to do?

Thanks,
Naresh

On Wed, Feb 14, 2018 at 8:04 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> 1. Just loop like this.
>
>
> def startQuery(): Streaming Query = {
>    // Define the dataframes and start the query
> }
>
> // call this on main thread
> while (notShutdown) {
>    val query = startQuery()
>    query.awaitTermination(refreshIntervalMs)
>    query.stop()
>    // refresh static data
> }
>
>
> 2. Yes, stream-stream joins in 2.3.0, soon to be released. RC3 is
> available if you want to test it right now - https://dist.apache.org/
> repos/dist/dev/spark/v2.3.0-rc3-bin/.
>
>
>
> On Wed, Feb 14, 2018 at 3:34 AM, Appu K <kut...@gmail.com> wrote:
>
>> TD,
>>
>> Thanks a lot for the quick reply :)
>>
>>
>> Did I understand it right that in the main thread, to wait for the
>> termination of the context I'll not be able to use
>>  outStream.awaitTermination()  -  [ since i'll be closing in inside another
>> thread ]
>>
>> What would be a good approach to keep the main app long running if I’ve
>> to restart queries?
>>
>> Should i just wait for 2.3 where i'll be able to join two structured
>> streams ( if the release is just a few weeks away )
>>
>> Appreciate all the help!
>>
>> thanks
>> App
>>
>>
>>
>> On 14 February 2018 at 4:41:52 PM, Tathagata Das (
>> tathagata.das1...@gmail.com) wrote:
>>
>> Let me fix my mistake :)
>> What I suggested in that earlier thread does not work. The streaming
>> query that joins a streaming dataset with a batch view, does not correctly
>> pick up when the view is updated. It works only when you restart the query.
>> That is,
>> - stop the query
>> - recreate the dataframes,
>> - start the query on the new dataframe using the same checkpoint location
>> as the previous query
>>
>> Note that you dont need to restart the whole process/cluster/application,
>> just restart the query in the same process/cluster/application. This should
>> be very fast (within a few seconds). So, unless you have latency SLAs of 1
>> second, you can periodically restart the query without restarting the
>> process.
>>
>> Apologies for my misdirections in that earlier thread. Hope this helps.
>>
>> TD
>>
>> On Wed, Feb 14, 2018 at 2:57 AM, Appu K <kut...@gmail.com> wrote:
>>
>>> More specifically,
>>>
>>> Quoting TD from the previous thread
>>> "Any streaming query that joins a streaming dataframe with the view will
>>> automatically start using the most updated data as soon as the view is
>>> updated”
>>>
>>> Wondering if I’m doing something wrong in  https://gist.github.com/anony
>>> mous/90dac8efadca3a69571e619943ddb2f6
>>>
>>> My streaming dataframe is not using the updated data, even though the
>>> view is updated!
>>>
>>> Thank you
>>>
>>>
>>> On 14 February 2018 at 2:54:48 PM, Appu K (kut...@gmail.com) wrote:
>>>
>>> Hi,
>>>
>>> I had followed the instructions from the thread https://mail-archives.a
>>> pache.org/mod_mbox/spark-user/201704.mbox/%3CD1315D33-41CD-4
>>> ba3-8b77-0879f3669...@qvantel.com%3E while trying to reload a static
>>> data frame periodically that gets joined to a structured streaming query.
>>>
>>> However, the streaming query results does not reflect the data from the
>>> refreshed static data frame.
>>>
>>> Code is here https://gist.github.com/anonymous/90dac8efadca3a69571e6
>>> 19943ddb2f6
>>>
>>> I’m using spark 2.2.1 . Any pointers would be highly helpful
>>>
>>> Thanks a lot
>>>
>>> Appu
>>>
>>>
>>
>

Re: Spark structured streaming: periodically refresh static data frame

Reply via email to