[Spark Streaming WAL] custom java streaming receiver and the WAL

Charles O. Bajomo Wed, 15 Feb 2017 13:02:32 -0800

Hello all, 

I am having some problems with my custom java based receiver. I am running 
Spark 1.5.0 and I used the template on the spark website 
(http://spark.apache.org/docs/1.0.0/streaming-custom-receivers.html). Basically 
my receiver listens to a JMS queue (Solace) and then based on the size of the 
received messages or the number of messages, it stores this messages by calling 
store. It doesn't ack the messages until after store has been called.


The problems I am having are: 

1. I am not able to use the internal backpressure system in spark streaming to 
control my receiver so it doesn't overload my executors. Is there something 
extra I need to implement in other to make the driver pause the receiver so my 
system is not unstable? I tried doing it myself by using a JobListener and 
stopping the RecieverTracker once I noticed the number of queued batches is at 
a set limit. This works but it means that every time I restarted the 
RecieverTracker (calling start), the processing time for each batch was 
increasing in the SparkUI. I think stopping it might be affecting the metrics. 
Is there any other way to do this? 

2. The other problem I have is the WAL. If I ask spark to unpersist my rdds 
(spark.streaming.unpersist = true), I get a lot of WAL exceptions saying (Could 
not read from WriteAheadLog) This is a problem because even though I am calling 
persist on my RDD after processing it, Yarn sometimes kills my job because the 
containers have gone above the allocated memory limit. Does anyone have any 
idea how to get around this? If I set unpersist to false, this problem goes 
away. 

3. Finally in other to avoid some of the issues I detailed above, I tried 
running a version of my application without streaming. So I connect to the 
queue in the driver, then create an RDD from the received messages and process 
them on the executors. I do this in an infinite loop until a stop file is 
placed on the HDFS file system and my spark application exits. This works fine 
to my surprise but I am not sure if this is a more stable and correct solution 
in place of Spark Streaming. 

I would be really grateful if anyone has come across similar problems and can 
shed some light on a solution. 

Thanks in advanced. 

---- 
Charles Bajomo 
Operations Director 
www.cloudxtiny.co.uk | Precision Technology Consulting Ltd 
Registered England & Wales : 07397178 
VAT No. : 124 4354 38 GB

[Spark Streaming WAL] custom java streaming receiver and the WAL

Reply via email to