Hey Daniel,

I see what's going on. 10.0.0.{4,5} go down, re-resolution is triggered, 
but before the DNS has been updated, so round robin gets 10.0.0.{4,5} 
again. These addresses will never again have a backend behind, so RR gets 
caught up in a retry-loop, unaware of the DNS update pointing to the 
available 10.0.0.{7,8}. The fix for this issue is almost ready to be merged 
(see here <https://github.com/grpc/grpc/pull/12829>), and consists on 
actively re-requesting resolutions in a different way, not only when we go 
from healthy to unhealthy LB (right now we don't re-resolve if we stay in 
unhealthy without having ever been healthy).

tl;dr: please check back once https://github.com/grpc/grpc/pull/12829 has 
been merged, which should happen within 1-2 weeks. I'm adding Juanli (the 
author of that change) to this thread.

On Thursday, 2 November 2017 00:59:26 UTC-7, [email protected] wrote:
>
> Hi David,
>
> The ports don't change, they remain the same (port 3000) for all server 
> instances.  DNS is updated with the new IP addresses to show this, the logs 
> show the results of dns lookups (by nodejs) when each request is sent to 
> the server.  Log from client with debug logging enabled is attached.
>
> Hope you are able to spot something.
>
> Regards,
> Daniel
>
> On Thursday, November 2, 2017 at 1:42:41 PM UTC+13, David Garcia Quintas 
> wrote:
>>
>> Another consideration is: has DNS been updated the reflect the new 
>> servers's IPs? The way things work is:
>>
>>    - (DNS) names such as foo.com resolve to a set of ip addresses
>>    - LB policies are created over the set of ip addresses (internally 
>>    there's one subchannel per ip)
>>    - If all subchannels go into shutdown (eg when all servers die), the 
>>    LB policy will also die. This will result in a request for re-resolution 
>> of 
>>    the name under which the channel was created (in this case, the DNS 
>> name). 
>>    A new LB policy will be created from the results of this re-resolution.
>>
>> Which is why, if 1) port numbers change (DNS doesn't provide port 
>> information) and/or 2) DNS doesn't resolve to the servers's new addresses, 
>> the new LB policy won't contain valid subchannels. 
>>
>> On Wednesday, 1 November 2017 16:48:15 UTC-7, David Garcia Quintas wrote:
>>>
>>> Hi Daniel,
>>>
>>> Do the port numbers of the server also change? If not, it'd be helpful 
>>> if you could provide me with the logs produced when run with the following 
>>> environment variables set: GRPC_VERBOSITY=debug 
>>> GRPC_TRACE=client_channel,round_robin
>>>
>>> On Thursday, 26 October 2017 13:01:17 UTC-7, [email protected] wrote:
>>>>
>>>> Hi Michael,
>>>>
>>>> On Friday, October 27, 2017 at 5:03:33 AM UTC+13, Michael Lumish wrote:
>>>>>
>>>>> To clarify, are you saying that after your client loses its connection 
>>>>> to every server, it never reestablishes a connection with any of them?
>>>>>
>>>>>>
>>>>>>
>>>> Yes, exactly.  
>>>>
>>>> I have a small project and a bash script that that demonstrates this.  
>>>> If you've got a Linux/Mac with nodejs and docker running in swarm mode, I 
>>>> can share it with you.  Essentially it starts 1 client and 2 server 
>>>> instances (the client just sends a 'ping' request every 2 seconds the 
>>>> server just sends a response indicating which server responded) all runs 
>>>> well and shows load balancing between them. Then the script shuts down 
>>>> both 
>>>> instances of the server and starts them up again in a way to force them to 
>>>> have different IP addresses.  The client is never able to reconnect with 
>>>> the 2 new instances on the different IP addresses.
>>>>
>>>> Regards,
>>>> Daniel
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/bc9527bd-d394-470f-bd27-c548163dccd3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to