fdoumet opened a new issue #6573: Druid tasks fail occasionally on Azure Storage
URL: https://github.com/apache/incubator-druid/issues/6573
 
 
   Very frequently, druid tasks fail on Azure Storage with 
`com.microsoft.azure.storage.StorageException: The specified block list is 
invalid.`
   
   Does anybody know what could be the issue or how to solve this? Full stack 
trace below:
   
   ```
   2018-11-03T01:05:05,774 ERROR 
[native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] 
io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged 
index[native-content-events]: 
{class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class 
java.lang.RuntimeException, exceptionMessage=java.io.IOException, 
interval=2018-11-02T00:00:00.000Z/2018-11-03T00:00:00.000Z}
   java.lang.RuntimeException: java.io.IOException
        at com.google.common.base.Throwables.propagate(Throwables.java:160) 
~[guava-16.0.1.jar:?]
        at 
io.druid.storage.azure.AzureDataSegmentPusher.push(AzureDataSegmentPusher.java:162)
 ~[?:?]
        at 
io.druid.segment.realtime.plumber.RealtimePlumber$2.doRun(RealtimePlumber.java:447)
 [druid-server-0.12.3.jar:0.12.3]
        at 
io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42)
 [druid-common-0.12.3.jar:0.12.3]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[?:1.8.0_111]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[?:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
   Caused by: java.io.IOException
        at 
com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:639) 
~[?:?]
        at 
com.microsoft.azure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:280)
 ~[?:?]
        at 
com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:580) 
~[?:?]
        at 
com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:497) 
~[?:?]
        at io.druid.storage.azure.AzureStorage.uploadBlob(AzureStorage.java:86) 
~[?:?]
        at 
io.druid.storage.azure.AzureDataSegmentPusher.uploadDataSegment(AzureDataSegmentPusher.java:115)
 ~[?:?]
        at 
io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:155)
 ~[?:?]
        at 
io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:151)
 ~[?:?]
        at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) 
~[java-util-0.12.3.jar:0.12.3]
        at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) 
~[java-util-0.12.3.jar:0.12.3]
        at 
io.druid.storage.azure.AzureUtils.retryAzureOperation(AzureUtils.java:58) ~[?:?]
        at 
io.druid.storage.azure.AzureDataSegmentPusher.push(AzureDataSegmentPusher.java:149)
 ~[?:?]
        ... 5 more
   Caused by: com.microsoft.azure.storage.StorageException: The specified block 
list is invalid.
        at 
com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
 ~[?:?]
        at 
com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307)
 ~[?:?]
        at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182)
 ~[?:?]
        at 
com.microsoft.azure.storage.blob.CloudBlockBlob.commitBlockList(CloudBlockBlob.java:245)
 ~[?:?]
        at 
com.microsoft.azure.storage.blob.BlobOutputStream.commit(BlobOutputStream.java:313)
 ~[?:?]
        at 
com.microsoft.azure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:277)
 ~[?:?]
        at 
com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:580) 
~[?:?]
        at 
com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:497) 
~[?:?]
        at io.druid.storage.azure.AzureStorage.uploadBlob(AzureStorage.java:86) 
~[?:?]
        at 
io.druid.storage.azure.AzureDataSegmentPusher.uploadDataSegment(AzureDataSegmentPusher.java:115)
 ~[?:?]
        at 
io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:155)
 ~[?:?]
        at 
io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:151)
 ~[?:?]
        at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) 
~[java-util-0.12.3.jar:0.12.3]
        at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) 
~[java-util-0.12.3.jar:0.12.3]
        at 
io.druid.storage.azure.AzureUtils.retryAzureOperation(AzureUtils.java:58) ~[?:?]
        at 
io.druid.storage.azure.AzureDataSegmentPusher.push(AzureDataSegmentPusher.java:149)
 ~[?:?]
        ... 5 more
   2018-11-03T01:05:05,844 INFO 
[native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] 
io.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing 
segment[native-content-events_2018-11-02T00:00:00.000Z_2018-11-03T00:00:00.000Z_2018-11-02T00:00:18.801Z]
 at 
path[/druid/segments/10.244.9.75:8101/10.244.9.75:8101_realtime__default_tier_2018-11-02T00:00:18.927Z_5ce6a9ad766940469addc11518f593bf0]
   2018-11-03T01:05:05,844 INFO 
[native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] 
io.druid.curator.announcement.Announcer - unannouncing 
[/druid/segments/10.244.9.75:8101/10.244.9.75:8101_realtime__default_tier_2018-11-02T00:00:18.927Z_5ce6a9ad766940469addc11518f593bf0]
   2018-11-03T01:05:05,858 INFO 
[native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] 
io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for 
task[index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf]:
 LockReleaseAction{interval=2018-11-02T00:00:00.000Z/2018-11-03T00:00:00.000Z}
   2018-11-03T01:05:05,859 INFO 
[native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] 
io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for 
task[index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf]
 to overlord: 
[LockReleaseAction{interval=2018-11-02T00:00:00.000Z/2018-11-03T00:00:00.000Z}].
   2018-11-03T01:05:05,860 INFO 
[native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] 
io.druid.java.util.http.client.pool.ChannelResourceFactory - Generating: 
http://10.244.2.87:8080
   2018-11-03T01:05:05,881 INFO 
[native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] 
io.druid.segment.realtime.plumber.RealtimePlumber - Deleting Index 
File[var/druid/task/index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf/work/persist/native-content-events/2018-11-02T00:00:00.000Z_2018-11-03T00:00:00.000Z]
   2018-11-03T01:05:05,904 INFO 
[native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] 
io.druid.segment.realtime.plumber.RealtimePlumber - Removing sinkKey 
1541116800000 for segment 
native-content-events_2018-11-02T00:00:00.000Z_2018-11-03T00:00:00.000Z_2018-11-02T00:00:18.801Z
   2018-11-03T01:05:05,911 ERROR [task-runner-0-priority-0] 
io.druid.indexing.common.task.RealtimeIndexTask - Failed to finish realtime 
task: {class=io.druid.indexing.common.task.RealtimeIndexTask, 
exceptionType=class io.druid.java.util.common.ISE, exceptionMessage=Exception 
occurred during persist and merge.}
   io.druid.java.util.common.ISE: Exception occurred during persist and merge.
        at 
io.druid.segment.realtime.plumber.RealtimePlumber.finishJob(RealtimePlumber.java:557)
 ~[druid-server-0.12.3.jar:0.12.3]
        at 
io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:458) 
[druid-indexing-service-0.12.3.jar:0.12.3]
        at 
io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444)
 [druid-indexing-service-0.12.3.jar:0.12.3]
        at 
io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416)
 [druid-indexing-service-0.12.3.jar:0.12.3]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_111]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[?:1.8.0_111]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[?:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
   2018-11-03T01:05:05,912 ERROR [task-runner-0-priority-0] 
io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running 
task[AbstractTask{id='index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf',
 groupId='index_realtime_native-content-events', 
taskResource=TaskResource{availabilityGroup='native-content-events-2018-11-02T00:00:00.000Z-0000',
 requiredCapacity=1}, dataSource='native-content-events', context={}}]
   io.druid.java.util.common.ISE: Exception occurred during persist and merge.
        at 
io.druid.segment.realtime.plumber.RealtimePlumber.finishJob(RealtimePlumber.java:557)
 ~[druid-server-0.12.3.jar:0.12.3]
        at 
io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:458) 
~[druid-indexing-service-0.12.3.jar:0.12.3]
        at 
io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444)
 [druid-indexing-service-0.12.3.jar:0.12.3]
        at 
io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416)
 [druid-indexing-service-0.12.3.jar:0.12.3]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_111]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[?:1.8.0_111]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[?:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
   2018-11-03T01:05:05,912 INFO [task-runner-0-priority-0] 
io.druid.indexing.overlord.TaskRunnerUtils - Task 
[index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf] 
status changed to [FAILED].
   2018-11-03T01:05:05,913 INFO [task-runner-0-priority-0] 
io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with 
status: {
     "id" : 
"index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf",
     "status" : "FAILED",
     "duration" : 90289188
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to