Thanks Cody for the quick help. Yes, the exception is happening in the
executors during processing. I will look into cloning the KafkaRDD and
swallowing the exception.

But, something weird is happening: when I enable checkpointing on the job,
my job doesn't crash, it happily proceeds with the next batch, even though
I see tons of exceptions in the executor logs. So the question is: why is
it that the spark job doesn't crash when checkpointing is enabled?

I have my code pasted here:
https://gist.github.com/ramkumarvenkat/00f4fc63f750c537defd

I am not too sure if this is an issue with spark engine or with the
streaming module. Please let me know if you need more logs or you want me
to raise a github issue/JIRA.

Sorry for digressing on the original thread.

On Fri, Mar 18, 2016 at 8:10 PM, Cody Koeninger <c...@koeninger.org> wrote:

> Is that happening only at startup, or during processing?  If that's
> happening during normal operation of the stream, you don't have enough
> resources to process the stream in time.
>
> There's not a clean way to deal with that situation, because it's a
> violation of preconditions.  If you want to modify the code to do what
> makes sense for you, start looking at handleFetchErr in KafkaRDD.scala
>   Recompiling that package isn't a big deal, because it's not a part
> of the core spark deployment, so you'll only have to change your job,
> not the deployed version of spark.
>
>
>
> On Fri, Mar 18, 2016 at 6:16 AM, Ramkumar Venkataraman
> <ram.the.m...@gmail.com> wrote:
> > I am using Spark streaming and reading data from Kafka using
> > KafkaUtils.createDirectStream. I have the "auto.offset.reset" set to
> > smallest.
> >
> > But in some Kafka partitions, I get
> kafka.common.OffsetOutOfRangeException
> > and my spark job crashes.
> >
> > I want to understand if there is a graceful way to handle this failure
> and
> > not kill the job. I want to keep ignoring these exceptions, as some other
> > partitions are fine and I am okay with data loss.
> >
> > Is there any way to handle this and not have my spark job crash? I have
> no
> > option of increasing the kafka retention period.
> >
> > I tried to have the DStream returned by createDirectStream() wrapped in a
> > Try construct, but since the exception happens in the executor, the Try
> > construct didn't take effect. Do you have any ideas of how to handle
> this?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-gracefully-handle-Kafka-OffsetOutOfRangeException-tp26534.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>

Reply via email to