At high-level, I am looking to do sessionization. I want to combine events
based on some key, do some transformations and emit data to HDFS. The catch
is there are time boundaries, say, I group events in a window of 0.5 hours,
based on some timestamp key in the event. Typical event-time windowing
Nope, but when we migrated to spark 1.6, we didnt see the errors yet. Not
sure if they fixed in between releases or it just be a weird timing thing
that we havent discovered yet in 1.6 as well.
On Sat, Mar 4, 2017 at 12:00 AM, nimmi.cv [via Apache Spark User List] <
Spark: 1.6.1
I am trying to use the new mapWithState API and I am getting the following
error:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/streaming/StateSpec$
Caused by: java.lang.ClassNotFoundException:
org.apache.spark.streaming.StateSpec$
Build.sbt
org> wrote:
> Spark streaming in general will retry a batch N times then move on to
> the next one... off the top of my head, I'm not sure why checkpointing
> would have an effect on that.
>
> On Mon, Mar 21, 2016 at 3:25 AM, Ramkumar Venkataraman
> <ram.the.m...@gmail.
at
> makes sense for you, start looking at handleFetchErr in KafkaRDD.scala
> Recompiling that package isn't a big deal, because it's not a part
> of the core spark deployment, so you'll only have to change your job,
> not the deployed version of spark.
>
>
>
> On Fri
I am using Spark streaming and reading data from Kafka using
KafkaUtils.createDirectStream. I have the "auto.offset.reset" set to
smallest.
But in some Kafka partitions, I get kafka.common.OffsetOutOfRangeException
and my spark job crashes.
I want to understand if there is a graceful way to