clintropolis commented on issue #8107: Add CliIndexer process type and initial 
task runner implementation
URL: https://github.com/apache/incubator-druid/pull/8107#issuecomment-514496119
 
 
   I did some more testing with this on my laptop with a setup of 1 of each 
broker, router, coordinator, overlord, and 2 indexer and historicals
   
   <img width="1669" alt="Screen Shot 2019-07-23 at 6 20 58 PM" 
src="https://user-images.githubusercontent.com/1577461/61769231-6906c480-ad9e-11e9-8f73-a9cf7c34083f.png";>
   
   Doing small scale some kafka indexing testing to make sure realtime queries 
and handoff were functioning
   
   <img width="1674" alt="Screen Shot 2019-07-23 at 6 43 32 PM" 
src="https://user-images.githubusercontent.com/1577461/61769298-a2d7cb00-ad9e-11e9-8b85-7c2230674167.png";>
   
   Overall things are working nicely. I did run into an issue when trying to 
stop an indexer node (`SIGTERM`), I believe the issue lies with the order of 
lifecycle shutdown, in that the tasks are gracefully stopped _after_ jetty is 
stopped. This causes the lifecycle stop on the indexer to hang during graceful 
task stop, because the task is waiting to hear from the overlord a message it 
will never be able to hear without a running jetty. 
   
   The supervisor on the overlord is then forever stuck in a loop, performing 
an action it can never complete because the indexer has stopped listening.
   
   ```2019-07-24T01:58:21,268 INFO [KafkaSupervisor-wikipedia] 
org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - 
[wikipedia] supervisor is running.
   2019-07-24T01:58:21,268 INFO [KafkaSupervisor-wikipedia] 
org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - 
{id='wikipedia', generationTime=2019-07-24T01:58:21.268Z, 
payload=KafkaSupervisorReportPayload{dataSource='wikipedia', topic='wikipedia', 
partitions=1, replicas=2, durationSeconds=600, 
active=[{id='index_kafka_wikipedia_ed020815fc3c3f4_bebmfiod', 
startTime=2019-07-24T01:55:50.422Z, remainingSeconds=449}, 
{id='index_kafka_wikipedia_ed020815fc3c3f4_cfcibngo', 
startTime=2019-07-24T01:55:50.525Z, remainingSeconds=449}], publishing=[], 
suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, 
recentErrors=[]}}
   2019-07-24T01:58:36,268 INFO [IndexTaskClient-wikipedia-0] 
org.apache.druid.indexing.common.IndexTaskClient - submitRequest failed for 
[http://localhost:8092/druid/worker/v1/chat/index_kafka_wikipedia_ed020815fc3c3f4_bebmfiod/offsets/current],
 with message [Connection refused (Connection refused)]
   ```
   
   The indexer eventually gives up after 5 minute timeout and ungracefully 
stops, but the supervisor/overlord appears to remain stuck until either the 
indexer comes back on the same host/port or the overlord is restarted. This 
also jams up what the web ui displays as the task status, where the task of the 
stuck indexer remains in the 'running' state until the same condition of the 
indexer returning or the overlord is restarting is met.
   
   This issue aside, I'm still +1 on this if you'd rather fix this in a 
follow-up PR, since this is currently an undocumented feature anyway.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to