pchang388 commented on issue #12701: URL: https://github.com/apache/druid/issues/12701#issuecomment-1178206982
#2 Continued After the 202 Accepted, Supervisor/Overlord will constantly hit the "/status" endpoint of the peon in order to see if it actually paused. After the 202 it appears to only respond with "STARTING" status ``` 022-07-06T21:56:33,609 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Pause task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] 2022-07-06T21:56:33,612 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.common.IndexTaskClient - HTTP POST: http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause 2022-07-06T21:56:33,612 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] starting 2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: DefaultHttpResponse(chunked: true) X-Druid-Task-Id: index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc 2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got response: 202 Accepted 2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@6634e90e 2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got chunk: 44B, last=false 2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630 2022-07-06T21:56:35,618 DEBUG [HttpClient-Netty-Worker-11] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got chunk: 0B, last=true 2022-07-06T21:56:35,618 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - GetStatus task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] 2022-07-06T21:56:35,623 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.common.IndexTaskClient - HTTP GET: http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status 2022-07-06T21:56:35,623 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] starting 2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: DefaultHttpResponse(chunked: true) X-Druid-Task-Id: index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc 2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got response: 200 OK 2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@4922d15b 2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got chunk: 9B, last=false 2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630 2022-07-06T21:56:35,628 DEBUG [HttpClient-Netty-Worker-16] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got chunk: 0B, last=true 2022-07-06T21:56:35,628 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] 2022-07-06T21:56:37,628 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - GetStatus task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] 2022-07-06T21:56:37,632 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.common.IndexTaskClient - HTTP GET: http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status 2022-07-06T21:56:37,632 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] starting 2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: DefaultHttpResponse(chunked: true) X-Druid-Task-Id: index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc 2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got response: 200 OK 2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@38855127 2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got chunk: 9B, last=false 2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630 2022-07-06T21:56:37,637 DEBUG [HttpClient-Netty-Worker-21] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/status] Got chunk: 0B, last=true 2022-07-06T21:56:37,637 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] 2022-07-06T21:56:39,637 DEBUG [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - GetStatus task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] ``` * This continues for some time, I count 287 occurrences in this peon task for similar messages for retries: ``` 2022-07-06T21:56:35,628 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] ... 2022-07-06T22:02:15,663 INFO [IndexTaskClient-REDACT-4] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] 2022-07-06T22:02:16,518 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] 2022-07-06T22:02:17,196 INFO [IndexTaskClient-REDACT-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] 2022-07-06T22:02:17,680 INFO [IndexTaskClient-REDACT-4] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] 2022-07-06T22:02:18,524 INFO [IndexTaskClient-REDACT-5] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] 2022-07-06T22:02:19,206 INFO [IndexTaskClient-REDACT-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] ... ... 2022-07-06T22:02:27,720 INFO [IndexTaskClient-REDACT-4] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Still waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to change its status to [PAUSED]; will try again in [PT2S] ``` * Until eventually, it is killed via shutdown as it never paused and on the next pause attempt, it responded with "400 Bad Request - cannot pause a task in "PUBLISHING" phase". According to the log snippet below, it jumped from "STARTING" phase to "PUBLISHING" between "/status" checks/calls: ``` 2022-07-06T22:05:12,414 DEBUG [IndexTaskClient-REDACT-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient - Pause task[index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] 2022-07-06T22:05:12,417 DEBUG [IndexTaskClient-REDACT-0] org.apache.druid.indexing.common.IndexTaskClient - HTTP POST: http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause 2022-07-06T22:05:12,417 DEBUG [IndexTaskClient-REDACT-0] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] starting 2022-07-06T22:05:12,420 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: DefaultHttpResponse(chunked: true) X-Druid-Task-Id: index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got response: 400 Bad Request 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@317bbf2c 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got chunk: 66B, last=false 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630 2022-07-06T22:05:12,421 DEBUG [HttpClient-Netty-Worker-9] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8100/druid/worker/v1/chat/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/pause] Got chunk: 0B, last=true 2022-07-06T22:05:12,427 INFO [KafkaSupervisor-REDACT-Worker-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] because: [An exception occured while waiting for task [index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc] to pause: [org.apache.druid.java.util.common.IAE: Received 400 Bad Request with body: Can't pause, task is not in a pausable state (state: [PUBLISHING])]] 2022-07-06T22:05:12,427 DEBUG [KafkaSupervisor-REDACT-Worker-1] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] starting 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] messageReceived: DefaultHttpResponse(chunked: true) 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] Got response: 200 OK 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] messageReceived: org.jboss.netty.handler.codec.http.DefaultHttpChunk@3c164b04 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] Got chunk: 53B, last=false 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] messageReceived: org.jboss.netty.handler.codec.http.HttpChunk$1@47788630 2022-07-06T22:05:12,439 DEBUG [HttpClient-Netty-Worker-6] org.apache.druid.java.util.http.client.NettyHttpClient - [POST http://REDACTED.host.com:8091/druid/worker/v1/task/index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc/shutdown] Got chunk: 0B, last=true 2022-07-06T22:05:12,439 INFO [KafkaSupervisor-REDACT-Worker-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: REDACTED.host.com:8091, status 200 OK, response: {"task":"index_kafka_REDACT_a5c10ee5effa63e_bhjndmoc"} ``` * The Peon is able to respond with 202 but it's not clear on why it was stuck in "STARTING" phase even though it appeared we were ingesting data and doing actual work : * https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java (lines: 1770-1826) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
