revverse opened a new issue #6132: Kill task failure when interval data partitioned URL: https://github.com/apache/incubator-druid/issues/6132 I have partitioned data in HDFS storage for time ranges: ``` | e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z | e-001 | 2017-11-22T02:35:47.071Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z | 1 | 2017-11-22T01:47:42.266Z | 0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/0_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":0,"partitions":0},"binaryVersion":9,"size":29458664,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z"} | | e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_1 | e-001 | 2017-11-22T02:35:47.070Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z | 1 | 2017-11-22T01:47:42.266Z | 0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/1_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":1,"partitions":0},"binaryVersion":9,"size":32192241,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_1"} | | e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_2 | e-001 | 2017-11-22T02:35:47.069Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z | 1 | 2017-11-22T01:47:42.266Z | 0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/2_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":2,"partitions":0},"binaryVersion":9,"size":21045793,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_2"} | | e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_3 | e-001 | 2017-11-22T03:35:54.100Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z | 1 | 2017-11-22T01:47:42.266Z | 0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/3_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":3,"partitions":0},"binaryVersion":9,"size":32177515,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_3"} | | e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_4 | e-001 | 2017-11-22T03:35:54.099Z | 2017-11-22T01:00:00.000Z | 2017-11-22T02:00:00.000Z | 1 | 2017-11-22T01:47:42.266Z | 0 | {"dataSource":"e-001","interval":"2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z","version":"2017-11-22T01:47:42.266Z","loadSpec":{"type":"hdfs","path":"hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/4_index.zip"},"dimensions":"{..}","shardSpec":{"type":"numbered","partitionNum":4,"partitions":0},"binaryVersion":9,"size":3594087,"identifier":"e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_4"} | ``` When i run kill task : > { > "type": "kill", > "id": "clean001-2017-11-22T01", > "interval": "2017-11-22T01:00:00Z/2017-11-22T02:00:01Z", > "dataSource": "e-001 " > } I see that only first partition "0_index.zip" is removed from storage and task start failing with this errors: ``` 2018-08-09T12:54:14,089 WARN [main] io.druid.query.lookup.LookupReferencesManager - No lookups found for tier [__default], response [io.druid.java.util.http.client.response.FullResponseHolder@63c99f7] 2018-08-09T12:54:14,089 INFO [main] io.druid.query.lookup.LookupReferencesManager - Coordinator is unavailable. Loading saved snapshot instead 2018-08-09T12:54:14,089 INFO [main] io.druid.query.lookup.LookupReferencesManager - No lookups to be loaded at this point 2018-08-09T12:54:14,090 INFO [main] io.druid.query.lookup.LookupReferencesManager - LookupReferencesManager is started. 2018-08-09T12:54:14,090 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.server.listener.announcer.ListenerResourceAnnouncer.start()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer@601f264d]. 2018-08-09T12:54:14,114 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Announcing start time on [/druid/listeners/lookups/__default/http:1.1.1.1:7109] 2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z 2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_1 2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_2 2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_3 2018-08-09T12:54:15,328 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.KillTask - OK to kill segment: e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z_4 2018-08-09T12:54:15,329 INFO [task-runner-0-priority-0] io.druid.storage.hdfs.HdfsDataSegmentKiller - Killing segment[e-001_2017-11-22T01:00:00.000Z_2017-11-22T02:00:00.000Z_2017-11-22T01:47:42.266Z] mapped to path[hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/0_index.zip] 2018-08-09T12:54:15,595 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[clean001-2017-11-22T01]: SegmentNukeAction{segments=[DataSegment{size=29458664, shardSpec=NumberedShardSpec{partitionNum=0, partitions=0}, metrics=[{....}], version='2017-11-22T01:47:42.266Z', loadSpec={type=>hdfs, path=>hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/0_index.zip}, interval=2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z, dataSource='e-001', binaryVersion='9'}]} 2018-08-09T12:54:15,605 INFO [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[clean001-2017-11-22T01] to overlord: [SegmentNukeAction{segments=[DataSegment{size=29458664, shardSpec=NumberedShardSpec{partitionNum=0, partitions=0}, metrics=[{....}], version='2017-11-22T01:47:42.266Z', loadSpec={type=>hdfs, path=>hdfs://ha/druid/e-001/20171122T010000.000Z_20171122T020000.000Z/2017-11-22T01_47_42.266Z/0_index.zip}, interval=2017-11-22T01:00:00.000Z/2017-11-22T02:00:00.000Z, dataSource='e-001', binaryVersion='9'}]}]. 2018-08-09T12:54:15,613 WARN [task-runner-0-priority-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Exception submitting action for task[clean001-2017-11-22T01] io.druid.java.util.common.IOE: Scary HTTP status returned: 500 Server Error. Check your overlord logs for exceptions. at io.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:95) [druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.common.task.KillTask.run(KillTask.java:104) [druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.1.jar:0.12.1] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66] ``` On storage : ``` -rw-r--r-- druid hdfs 1.03 KB 22.11.2017, 04:34:31 3 128 MB 1_descriptor.json -rw-r--r-- druid hdfs 19.7 MB 22.11.2017, 04:34:31 3 128 MB 1_index.zip -rw-r--r-- druid hdfs 1.03 KB 22.11.2017, 04:33:50 3 128 MB 2_descriptor.json -rw-r--r-- druid hdfs 12.88 MB 22.11.2017, 04:33:50 3 128 MB 2_index.zip -rw-r--r-- druid hdfs 1.03 KB 22.11.2017, 05:34:20 3 128 MB 3_descriptor.json -rw-r--r-- druid hdfs 19.7 MB 22.11.2017, 05:34:20 3 128 MB 3_index.zip -rw-r--r-- druid hdfs 1.03 KB 22.11.2017, 05:33:35 3 128 MB 4_descriptor.json -rw-r--r-- druid hdfs 2.15 MB 22.11.2017, 05:33:35 3 128 MB 4_index.zip ``` Before task started files for first partition was here Supervisor logs: ``` 2018-08-09T11:39:06,178 WARN [qtp1651923692-164] org.eclipse.jetty.servlet.ServletHandler - /druid/indexer/v1/action io.druid.java.util.common.ISE: Segments not covered by locks for task: clean001-2017-11-22T01 at io.druid.indexing.common.actions.TaskActionPreconditions.checkLockCoversSegments(TaskActionPreconditions.java:45) ~[druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.common.actions.SegmentNukeAction.perform(SegmentNukeAction.java:70) ~[druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.common.actions.SegmentNukeAction.perform(SegmentNukeAction.java:40) ~[druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.common.actions.LocalTaskActionClient.submit(LocalTaskActionClient.java:64) ~[druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.overlord.http.OverlordResource$3.apply(OverlordResource.java:345) ~[druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.overlord.http.OverlordResource$3.apply(OverlordResource.java:334) ~[druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.overlord.http.OverlordResource.asLeaderWith(OverlordResource.java:672) ~[druid-indexing-service-0.12.1.jar:0.12.1] at io.druid.indexing.overlord.http.OverlordResource.doAction(OverlordResource.java:331) ~[druid-indexing-service-0.12.1.jar:0.12.1] at sun.reflect.GeneratedMethodAccessor179.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_66] ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
