tanisdlj opened a new issue #10866:
URL: https://github.com/apache/druid/issues/10866


   ### Affected Version
   
   0.20.0
   
   ### Description
   
   We've seen tasks failing randomly. Didn't managed to get a pattern or cause, 
so I'm thinking might be some sort of bug.
   
   Cluster size: 27 Dataservers (Middlemanager + Historical), 2 Brokers, 2 
Routers, 2 Overlords, 2 Coordinator, 3 Zookeeper (in dedicated hosts).
   
   Partial log:
   ```
   2021-02-08T22:35:01,185 INFO [qtp1999784536-190] 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - 
Stopping gracefully (status: [PUBLISHING])
   2021-02-08T22:35:01,189 ERROR [task-runner-0-priority-0] 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - 
Encountered exception in run() before persisting.
   org.apache.kafka.common.errors.InterruptException: 
java.lang.InterruptedException
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeThrowInterruptException(ConsumerNetworkClient.java:520)
 ~[?:?]
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:281)
 ~[?:?]
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
 ~[?:?]
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1301)
 ~[?:?]
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1237) 
~[?:?]
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
~[?:?]
        at 
org.apache.druid.indexing.kafka.KafkaRecordSupplier.poll(KafkaRecordSupplier.java:124)
 ~[?:?]
        at 
org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner.getRecords(IncrementalPublishingKafkaIndexTaskRunner.java:98)
 ~[?:?]
        at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:603)
 [druid-indexing-service-0.20.0.jar:0.20.0]
        at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:267)
 [druid-indexing-service-0.20.0.jar:0.20.0]
        at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:145)
 [druid-indexing-service-0.20.0.jar:0.20.0]
        at 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:451)
 [druid-indexing-service-0.20.0.jar:0.20.0]
        at 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:423)
 [druid-indexing-service-0.20.0.jar:0.20.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_275]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_275]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_275]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
   Caused by: java.lang.InterruptedException
        ... 17 more
   2021-02-08T22:35:01,218 INFO [task-runner-0-priority-0] 
org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing 
segment[ssp-events-hourly_2021-02-08T20:00:00.000Z_2021-02-08T21:00:00.000Z_2021-02-08T21:14:50.337Z_77]
 at 
path[/druid/segments/druid-dataserver-4.c.maximal-record-121815.internal:8103/druid-dataserver-4.c.maximal-record-121815.internal:8103_indexer-executor__default_tier_2021-02-08T18:29:28.553Z_8e48d33532c9490e9a7b6b372bf2cf0d0]
   2021-02-08T22:35:01,221 INFO [task-runner-0-priority-0] 
org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing 
segment[ssp-events-hourly_2021-02-08T19:00:00.000Z_2021-02-08T20:00:00.000Z_2021-02-08T19:58:53.766Z_427]
 at 
path[/druid/segments/druid-dataserver-4.c.maximal-record-121815.internal:8103/druid-dataserver-4.c.maximal-record-121815.internal:8103_indexer-executor__default_tier_2021-02-08T18:29:28.553Z_8e48d33532c9490e9a7b6b372bf2cf0d0]
   2021-02-08T22:35:01,221 INFO [task-runner-0-priority-0] 
org.apache.druid.curator.announcement.Announcer - Unannouncing 
[/druid/segments/druid-dataserver-4.c.maximal-record-121815.internal:8103/druid-dataserver-4.c.maximal-record-121815.internal:8103_indexer-executor__default_tier_2021-02-08T18:29:28.553Z_8e48d33532c9490e9a7b6b372bf2cf0d0]
   2021-02-08T22:35:01,621 ERROR 
[[index_kafka_ssp-events-hourly_5895d14b59e566a_fkffhhpn]-appenderator-persist] 
org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Incremental 
persist failed: 
{class=org.apache.druid.segment.realtime.appenderator.AppenderatorImpl, 
segment=ssp-events-hourly_2021-02-08T20:00:00.000Z_2021-02-08T21:00:00.000Z_2021-02-08T21:14:50.337Z_77,
 dataSource=ssp-events-hourly, count=10}
   2021-02-08T22:35:01,623 INFO [task-runner-0-priority-0] 
org.apache.druid.curator.announcement.Announcer - Unannouncing 
[/druid/internal-discovery/PEON/druid-dataserver-4.c.maximal-record-121815.internal:8103]
   2021-02-08T22:35:01,632 INFO [task-runner-0-priority-0] 
org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced self 
[{"druidNode":{"service":"druid/middleManager","host":"druid-dataserver-4.c.maximal-record-121815.internal","bindOnHost":false,"plaintextPort":8103,"port":-1,"tlsPort":-1,"enablePlaintextPort":true,"enableTlsPort":false},"nodeType":"peon","services":{"dataNodeService":{"type":"dataNodeService","tier":"_default_tier","maxSize":0,"type":"indexer-executor","priority":0},"lookupNodeService":{"type":"lookupNodeService","lookupTier":"__default"}}}].
   2021-02-08T22:35:01,632 INFO [task-runner-0-priority-0] 
org.apache.druid.curator.announcement.Announcer - Unannouncing 
[/druid/announcements/druid-dataserver-4.c.maximal-record-121815.internal:8103]
   2021-02-08T22:35:01,648 ERROR [task-runner-0-priority-0] 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - 
Encountered exception while running task.
   org.apache.kafka.common.errors.InterruptException: 
java.lang.InterruptedException
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeThrowInterruptException(ConsumerNetworkClient.java:520)
 ~[?:?]
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:281)
 ~[?:?]
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
 ~[?:?]
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1301)
 ~[?:?]
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1237) 
~[?:?]
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 
~[?:?]
        at 
org.apache.druid.indexing.kafka.KafkaRecordSupplier.poll(KafkaRecordSupplier.java:124)
 ~[?:?]
        at 
org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner.getRecords(IncrementalPublishingKafkaIndexTaskRunner.java:98)
 ~[?:?]
        at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:603)
 ~[druid-indexing-service-0.20.0.jar:0.20.0]
        at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:267)
 [druid-indexing-service-0.20.0.jar:0.20.0]
        at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:145)
 [druid-indexing-service-0.20.0.jar:0.20.0]
        at 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:451)
 [druid-indexing-service-0.20.0.jar:0.20.0]
        at 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:423)
 [druid-indexing-service-0.20.0.jar:0.20.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_275]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_275]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_275]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
        Suppressed: java.lang.InterruptedException
                at 
java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_275]
                at java.util.concurrent.FutureTask.get(FutureTask.java:191) 
~[?:1.8.0_275]
                at 
org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.persist(StreamAppenderatorDriver.java:231)
 ~[druid-server-0.20.0.jar:0.20.0]
                at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:756)
 ~[druid-indexing-service-0.20.0.jar:0.20.0]
                at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:267)
 [druid-indexing-service-0.20.0.jar:0.20.0]
                at 
org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:145)
 [druid-indexing-service-0.20.0.jar:0.20.0]
                at 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:451)
 [druid-indexing-service-0.20.0.jar:0.20.0]
                at 
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:423)
 [druid-indexing-service-0.20.0.jar:0.20.0]
                at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_275]
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_275]
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_275]
                at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
   Caused by: java.lang.InterruptedException
        ... 17 more
   2021-02-08T22:35:01,680 INFO [task-runner-0-priority-0] 
org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed 
with status: {
     "id" : "index_kafka_ssp-events-hourly_5895d14b59e566a_fkffhhpn",
     "status" : "FAILED",
     "duration" : 14743336,
     "errorMsg" : "org.apache.kafka.common.errors.InterruptException: 
java.lang.InterruptedException\n\tat org.apache.kaf...",
     "location" : {
       "host" : null,
       "port" : -1,
       "tlsPort" : -1
     }
   }
   2021-02-08T22:35:01,690 INFO [main] 
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle 
[module] stage [ANNOUNCEMENTS]
   2021-02-08T22:35:01,691 INFO [main] 
org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle 
[module] stage [SERVER]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to