churromorales commented on issue #5709: URL: https://github.com/apache/druid/issues/5709#issuecomment-1332739289
@gianm I think what we have noticed in our k8s environment is that queries fail due to pods restarting and / or nodes going down. I was thinking a fix could be this: 1. You have a list of servers you are going to send the query to. 2. You make the query, if there is a timeout then you check the list of available or current servers. 3. If your original list is not contained in the list of current servers, just make the query again, it will go to a replica and you will be okay. I think this would solve a lot of the issues we have currently with our druid users that have issues when pods restart and queries fail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
