fdoumet opened a new issue #6573: Druid tasks fail occasionally on Azure Storage URL: https://github.com/apache/incubator-druid/issues/6573 Very frequently, druid tasks fail on Azure Storage with `com.microsoft.azure.storage.StorageException: The specified block list is invalid.` Does anybody know what could be the issue or how to solve this? Full stack trace below: ``` 2018-11-03T01:05:05,774 ERROR [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[native-content-events]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class java.lang.RuntimeException, exceptionMessage=java.io.IOException, interval=2018-11-02T00:00:00.000Z/2018-11-03T00:00:00.000Z} java.lang.RuntimeException: java.io.IOException at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?] at io.druid.storage.azure.AzureDataSegmentPusher.push(AzureDataSegmentPusher.java:162) ~[?:?] at io.druid.segment.realtime.plumber.RealtimePlumber$2.doRun(RealtimePlumber.java:447) [druid-server-0.12.3.jar:0.12.3] at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42) [druid-common-0.12.3.jar:0.12.3] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111] Caused by: java.io.IOException at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:639) ~[?:?] at com.microsoft.azure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:280) ~[?:?] at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:580) ~[?:?] at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:497) ~[?:?] at io.druid.storage.azure.AzureStorage.uploadBlob(AzureStorage.java:86) ~[?:?] at io.druid.storage.azure.AzureDataSegmentPusher.uploadDataSegment(AzureDataSegmentPusher.java:115) ~[?:?] at io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:155) ~[?:?] at io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:151) ~[?:?] at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) ~[java-util-0.12.3.jar:0.12.3] at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[java-util-0.12.3.jar:0.12.3] at io.druid.storage.azure.AzureUtils.retryAzureOperation(AzureUtils.java:58) ~[?:?] at io.druid.storage.azure.AzureDataSegmentPusher.push(AzureDataSegmentPusher.java:149) ~[?:?] ... 5 more Caused by: com.microsoft.azure.storage.StorageException: The specified block list is invalid. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89) ~[?:?] at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307) ~[?:?] at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182) ~[?:?] at com.microsoft.azure.storage.blob.CloudBlockBlob.commitBlockList(CloudBlockBlob.java:245) ~[?:?] at com.microsoft.azure.storage.blob.BlobOutputStream.commit(BlobOutputStream.java:313) ~[?:?] at com.microsoft.azure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:277) ~[?:?] at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:580) ~[?:?] at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:497) ~[?:?] at io.druid.storage.azure.AzureStorage.uploadBlob(AzureStorage.java:86) ~[?:?] at io.druid.storage.azure.AzureDataSegmentPusher.uploadDataSegment(AzureDataSegmentPusher.java:115) ~[?:?] at io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:155) ~[?:?] at io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:151) ~[?:?] at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) ~[java-util-0.12.3.jar:0.12.3] at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[java-util-0.12.3.jar:0.12.3] at io.druid.storage.azure.AzureUtils.retryAzureOperation(AzureUtils.java:58) ~[?:?] at io.druid.storage.azure.AzureDataSegmentPusher.push(AzureDataSegmentPusher.java:149) ~[?:?] ... 5 more 2018-11-03T01:05:05,844 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[native-content-events_2018-11-02T00:00:00.000Z_2018-11-03T00:00:00.000Z_2018-11-02T00:00:18.801Z] at path[/druid/segments/10.244.9.75:8101/10.244.9.75:8101_realtime__default_tier_2018-11-02T00:00:18.927Z_5ce6a9ad766940469addc11518f593bf0] 2018-11-03T01:05:05,844 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.curator.announcement.Announcer - unannouncing [/druid/segments/10.244.9.75:8101/10.244.9.75:8101_realtime__default_tier_2018-11-02T00:00:18.927Z_5ce6a9ad766940469addc11518f593bf0] 2018-11-03T01:05:05,858 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf]: LockReleaseAction{interval=2018-11-02T00:00:00.000Z/2018-11-03T00:00:00.000Z} 2018-11-03T01:05:05,859 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf] to overlord: [LockReleaseAction{interval=2018-11-02T00:00:00.000Z/2018-11-03T00:00:00.000Z}]. 2018-11-03T01:05:05,860 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.java.util.http.client.pool.ChannelResourceFactory - Generating: http://10.244.2.87:8080 2018-11-03T01:05:05,881 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Deleting Index File[var/druid/task/index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf/work/persist/native-content-events/2018-11-02T00:00:00.000Z_2018-11-03T00:00:00.000Z] 2018-11-03T01:05:05,904 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Removing sinkKey 1541116800000 for segment native-content-events_2018-11-02T00:00:00.000Z_2018-11-03T00:00:00.000Z_2018-11-02T00:00:18.801Z 2018-11-03T01:05:05,911 ERROR [task-runner-0-priority-0] io.druid.indexing.common.task.RealtimeIndexTask - Failed to finish realtime task: {class=io.druid.indexing.common.task.RealtimeIndexTask, exceptionType=class io.druid.java.util.common.ISE, exceptionMessage=Exception occurred during persist and merge.} io.druid.java.util.common.ISE: Exception occurred during persist and merge. at io.druid.segment.realtime.plumber.RealtimePlumber.finishJob(RealtimePlumber.java:557) ~[druid-server-0.12.3.jar:0.12.3] at io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:458) [druid-indexing-service-0.12.3.jar:0.12.3] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111] 2018-11-03T01:05:05,912 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[AbstractTask{id='index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf', groupId='index_realtime_native-content-events', taskResource=TaskResource{availabilityGroup='native-content-events-2018-11-02T00:00:00.000Z-0000', requiredCapacity=1}, dataSource='native-content-events', context={}}] io.druid.java.util.common.ISE: Exception occurred during persist and merge. at io.druid.segment.realtime.plumber.RealtimePlumber.finishJob(RealtimePlumber.java:557) ~[druid-server-0.12.3.jar:0.12.3] at io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:458) ~[druid-indexing-service-0.12.3.jar:0.12.3] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111] 2018-11-03T01:05:05,912 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf] status changed to [FAILED]. 2018-11-03T01:05:05,913 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: { "id" : "index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf", "status" : "FAILED", "duration" : 90289188 } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
