[
https://issues.apache.org/jira/browse/SPARK-18400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654309#comment-15654309
]
Brian ONeill edited comment on SPARK-18400 at 11/10/16 3:20 PM:
----------------------------------------------------------------
Sure thing. I have the patch: null check with log.info when we see a null
shard.
I'm just double checking things...
It appears that workers might be caught in a loop on ZOMBIE shards. I see the
following lines repeated. I'm trying to track down the shard allocation logic
that might pull it out of the loop.
{code}
2016-11-10 10:05:37 INFO KinesisRecordProcessor:58 - Shutdown: Shutting down
workerId localhost:4bac2666-b7c9-4ecd-8c1a-15c790588443 with reason ZOMBIE
2016-11-10 10:05:38 INFO KinesisRecordProcessor:58 - Initialized workerId
localhost:4bac2666-b7c9-4ecd-8c1a-15c790588443 with shardId shardId-000000000257
{code}
was (Author: boneill42):
Sure thing. I have the patch: null check with log.info when we see a null
shard.
I'm just double checking things...
It appears that workers might be caught in a loop on ZOMBIE shards. I see the
following lines repeated. I'm trying to track down the shard allocation logic
that might pull it out of the loop.
2016-11-10 10:05:37 INFO KinesisRecordProcessor:58 - Shutdown: Shutting down
workerId localhost:4bac2666-b7c9-4ecd-8c1a-15c790588443 with reason ZOMBIE
2016-11-10 10:05:38 INFO KinesisRecordProcessor:58 - Initialized workerId
localhost:4bac2666-b7c9-4ecd-8c1a-15c790588443 with shardId shardId-000000000257
> NPE when resharding Kinesis Stream
> ----------------------------------
>
> Key: SPARK-18400
> URL: https://issues.apache.org/jira/browse/SPARK-18400
> Project: Spark
> Issue Type: Bug
> Components: DStreams
> Affects Versions: 1.6.2
> Environment: Spark 1.6 streaming from AWS Kinesis
> Reporter: Brian ONeill
> Priority: Minor
>
> Occasionally, we see an NPE when we reshard our streams:
> {code}
> java.lang.NullPointerException
> at
> java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106)
> ~[?:1.8.0_60]
> at
> java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097)
> ~[?:1.8.0_60]
> at
> org.apache.spark.streaming.kinesis.KinesisCheckpointer.removeCheckpointer(KinesisCheckpointer.scala:66)
> ~[spark-streaming-kinesis-asl_2.11-1.6.4-SNAPSHOT.jar:1.6.4-SNAPSHOT]
> at
> org.apache.spark.streaming.kinesis.KinesisReceiver.removeCheckpointer(KinesisReceiver.scala:245)
> ~[spark-streaming-kinesis-asl_2.11-1.6.4-SNAPSHOT.jar:1.6.4-SNAPSHOT]
> at
> org.apache.spark.streaming.kinesis.KinesisRecordProcessor.shutdown(KinesisRecordProcessor.scala:124)
> ~[spark-streaming-kinesis-asl_2.11-1.6.4-SNAPSHOT.jar:1.6.4-SNAPSHOT]
> at
> com.amazonaws.services.kinesis.clientlibrary.lib.worker.V1ToV2RecordProcessorAdapter.shutdown(V1ToV2RecordProcessorAdapter.java:48)
> ~[amazon-kinesis-client-1.6.2.jar:?]
> at
> com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShutdownTask.call(ShutdownTask.java:100)
> [amazon-kinesis-client-1.6.2.jar:?]
> at
> com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
> [amazon-kinesis-client-1.6.2.jar:?]
> at
> com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
> [amazon-kinesis-client-1.6.2.jar:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_60]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [?:1.8.0_60]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [?:1.8.0_60]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]