[
https://issues.apache.org/jira/browse/SPARK-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rado Buransky updated SPARK-12693:
----------------------------------
Description:
I am running Kafka server locally with extremely low retention of 3 seconds and
with 1 second segmentation. I create direct Kafka stream with auto.offset.reset
= smallest.
In case of bad luck (happens actually quite often in my case) the smallest
offset retrieved druing stream initialization doesn't already exists when
streaming actually starts.
Complete source code of the Spark Streaming application is here:
https://github.com/pygmalios/spark-checkpoint-experience/blob/cb27ab83b7a29e619386b56e68a755d7bd73fc46/src/main/scala/com/pygmalios/sparkCheckpointExperience/spark/SparkApp.scala
The application ends in an endless loop trying to get that non-existing offset
and has to be killed. Check attached logs from Spark and also from Kafka server.
was:
I am running Kafka server locally with extremely low retention of 3 seconds and
with 1 second segmentation. I create direct Kafka stream with auto.offset.reset
= smallest.
In case of bad luck (happens actually quite often in my case) the smallest
offset retrieved druing stream initialization doesn't already exists when
streaming actually starts.
Complete source code of the Spark Streaming application is here:
https://github.com/pygmalios/spark-checkpoint-experience/blob/cb27ab83b7a29e619386b56e68a755d7bd73fc46/src/main/scala/com/pygmalios/sparkCheckpointExperience/spark/SparkApp.scala
> OffsetOutOfRangeException cause by retention
> --------------------------------------------
>
> Key: SPARK-12693
> URL: https://issues.apache.org/jira/browse/SPARK-12693
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.6.0
> Environment: Ubuntu 64bit, Intel i7
> Reporter: Rado Buransky
> Priority: Minor
> Labels: kafka
> Attachments: kafka-log.txt, log.txt
>
>
> I am running Kafka server locally with extremely low retention of 3 seconds
> and with 1 second segmentation. I create direct Kafka stream with
> auto.offset.reset = smallest.
> In case of bad luck (happens actually quite often in my case) the smallest
> offset retrieved druing stream initialization doesn't already exists when
> streaming actually starts.
> Complete source code of the Spark Streaming application is here:
> https://github.com/pygmalios/spark-checkpoint-experience/blob/cb27ab83b7a29e619386b56e68a755d7bd73fc46/src/main/scala/com/pygmalios/sparkCheckpointExperience/spark/SparkApp.scala
> The application ends in an endless loop trying to get that non-existing
> offset and has to be killed. Check attached logs from Spark and also from
> Kafka server.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]