We had a similar issue while working on one of our usecase where we were processing at a moderate throughput (around 500MB/S). When the processing time exceeds the batch duration, it started to throw up blocknotfound exceptions, i made a workaround for that issue and is explained over here http://apache-spark-developers-list.1001551.n3.nabble.com/SparkStreaming-Workaround-for-BlockNotFound-Exceptions-td12096.html
Basically, instead of generating blocks blindly, i made the receiver sleep if there's an increase in the scheduling delay (if scheduling delay exceeds 3 times the batch duration). This prototype is working nicely and the speed is encouraging as its processing at 500MB/S without having any failures so far. Thanks Best Regards On Fri, May 8, 2015 at 8:11 PM, François Garillot < francois.garil...@typesafe.com> wrote: > Hi guys, > > We[1] are doing a bit of work on Spark Streaming, to help it face > situations where the throughput of data on an InputStream is (momentarily) > susceptible to overwhelm the Receiver(s) memory. > > The JIRA & design doc is here: > https://issues.apache.org/jira/browse/SPARK-7398 > > We'd sure appreciate your comments ! > > -- > François Garillot > [1]: Typesafe & some helpful collaborators on benchmarking 'at scale' >