[jira] [Commented] (SPARK-15408) Spark streaming app crashes with NotLeaderForPartitionException

Cody Koeninger (JIRA) Fri, 20 May 2016 07:34:27 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293437#comment-15293437
 ]


Cody Koeninger commented on SPARK-15408:
----------------------------------------

By design, the Kafka direct stream is not going to try to hide Kafka failures 
from you.

The number of times the driver will retry is controlled by 
spark.streaming.kafka.maxRetries

http://spark.apache.org/docs/latest/configuration.html#spark-streaming

The number of failures for any individual task is controlled the same way as 
any other spark job, spark.task.maxFailures        

> Spark streaming app crashes with NotLeaderForPartitionException 
> ----------------------------------------------------------------
>
>                 Key: SPARK-15408
>                 URL: https://issues.apache.org/jira/browse/SPARK-15408
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.6.0
>         Environment: Ubuntu 64 bit
>            Reporter: Johny Mathew
>            Priority: Critical
>
> We have a spark streaming application reading from kafka (with Kafka Direct 
> API) and it crashed with the exception shown in the next paragraph. We have a 
> 5 node kafka cluster with 19 partitions  (replication factor 3). Even though 
> the the spark application crashed the other kafka consumer apps were running 
> fine. Only one of the 5 kafka node was not working correctly (it did not go 
> down)
> /opt/hadoop/bin/yarn application -status application_1463151451543_0007
> 16/05/13 20:09:56 INFO client.RMProxy: Connecting to ResourceManager at 
> /172.16.130.189:8050
> Application Report :
>       Application-Id : application_1463151451543_0007
>       Application-Name : com.ibm.alchemy.eventgen.EventGenMetrics
>       Application-Type : SPARK
>       User : stack
>       Queue : default
>       Start-Time : 1463155034571
>       Finish-Time : 1463155310520
>       Progress : 100%
>       State : FINISHED
>       Final-State : FAILED
>       Tracking-URL : N/A
>       RPC Port : 0
>       AM Host : 172.16.130.188
>       Aggregate Resource Allocation : 9562329 MB-seconds, 2393 vcore-seconds
>       Diagnostics : User class threw exception: 
> org.apache.spark.SparkException: 
> ArrayBuffer(kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, 
> kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: 
> Couldn't find leader offsets for Set([alchemy-metrics,17], 
> [alchemy-metrics,10], [alchemy-metrics,3], [alchemy-metrics,4], 
> [alchemy-metrics,9], [alchemy-metrics,15], [alchemy-metrics,18], 
> [alchemy-metrics,5]))
> We cleared checkpoint and started the application but it crashed again. Then 
> at the end we found out the misbehaving kafka node and restarted it which 
> fixed the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-15408) Spark streaming app crashes with NotLeaderForPartitionException

Reply via email to