We had this same issue with the reduceByKeyAndWindow API that you are
using. For fixing this issue, you have to use  different flavor of that
API, specifically the 2 versions that allow you to give a 'Filter function'
to them. Putting in the filter functions helped stabilize our application
too.

HTH
NB


On Sun, Jun 12, 2016 at 11:19 PM, Roshan Singh <singh.rosha...@gmail.com>
wrote:

> Hi all,
> I have a python streaming job which is supposed to run 24x7. I am unable
> to stabilize it. The job just counts no of links shared in a 30 minute
> sliding window. I am using reduceByKeyAndWindow operation with a batch of
> 30 seconds, slide interval of 60 seconds.
>
> The kafka queue has a rate of nearly 2200 messages/second which can
> increase to 3000 but the mean is 2200.
>
> I have played around with batch size, slide interval, and by increasing
> parallelism with no fruitful result. These just delay the destabilization.
>
> GC time is usually between 60-100 ms.
>
> I also noticed that the jobs were not distributed to other nodes in the
> spark UI, for which I have used configured spark.locality.wait as 100ms.
> After which I have noticed that the job is getting distributed properly.
>
> I have a cluster of 6 slaves and one master each with 16 cores and 15gb of
> ram.
>
> Code and configuration: http://pastebin.com/93DMSiji
>
> Streaming screenshot: http://imgur.com/psNfjwJ
>
> I need help in debugging the issue. Any help will be appreciated.
>
> --
> Roshan Singh
>
>

Reply via email to