[ 
https://issues.apache.org/jira/browse/SPARK-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rado Buransky updated SPARK-12693:
----------------------------------
    Description: 
I am running Kafka server locally with extremely low retention of 3 seconds and 
with 1 second segmentation. I create direct Kafka stream with auto.offset.reset 
= smallest. 

In case of bad luck (happens actually quite often in my case) the smallest 
offset retrieved druing stream initialization doesn't already exists when 
streaming actually starts.

Complete source code of the Spark Streaming application is here:
https://github.com/pygmalios/spark-checkpoint-experience/blob/cb27ab83b7a29e619386b56e68a755d7bd73fc46/src/main/scala/com/pygmalios/sparkCheckpointExperience/spark/SparkApp.scala

The application ends in an endless loop trying to get that non-existing offset 
and has to be killed. Check attached logs from Spark and also from Kafka server.

  was:
I am running Kafka server locally with extremely low retention of 3 seconds and 
with 1 second segmentation. I create direct Kafka stream with auto.offset.reset 
= smallest. 

In case of bad luck (happens actually quite often in my case) the smallest 
offset retrieved druing stream initialization doesn't already exists when 
streaming actually starts.

Complete source code of the Spark Streaming application is here:
https://github.com/pygmalios/spark-checkpoint-experience/blob/cb27ab83b7a29e619386b56e68a755d7bd73fc46/src/main/scala/com/pygmalios/sparkCheckpointExperience/spark/SparkApp.scala


> OffsetOutOfRangeException cause by retention
> --------------------------------------------
>
>                 Key: SPARK-12693
>                 URL: https://issues.apache.org/jira/browse/SPARK-12693
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.6.0
>         Environment: Ubuntu 64bit, Intel i7
>            Reporter: Rado Buransky
>            Priority: Minor
>              Labels: kafka
>         Attachments: kafka-log.txt, log.txt
>
>
> I am running Kafka server locally with extremely low retention of 3 seconds 
> and with 1 second segmentation. I create direct Kafka stream with 
> auto.offset.reset = smallest. 
> In case of bad luck (happens actually quite often in my case) the smallest 
> offset retrieved druing stream initialization doesn't already exists when 
> streaming actually starts.
> Complete source code of the Spark Streaming application is here:
> https://github.com/pygmalios/spark-checkpoint-experience/blob/cb27ab83b7a29e619386b56e68a755d7bd73fc46/src/main/scala/com/pygmalios/sparkCheckpointExperience/spark/SparkApp.scala
> The application ends in an endless loop trying to get that non-existing 
> offset and has to be killed. Check attached logs from Spark and also from 
> Kafka server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to