It might be a red herring but there is reported issues of conntrack races
causing DNS to timeout with Kubernetes.
https://github.com/kubernetes/kubernetes/issues/56903
https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts

this wouldn't be visible on the kube-dns side but from the container
issuing DNS queries.

DNS resolution would be nice to have visible in traces but I don't think
that can be exposed as a ClientInterceptor.

On Fri, Nov 23, 2018 at 10:41 AM Alexandru Keszeg <[email protected]>
wrote:

> Monitoring on kube-dns, reports a request rate of only about 2req/s. I've
> also run tcpdump on the kube-dns pods and I haven't managed to catch any
> problems. I'm not 100%, but DNS doesn't seem to be an issue. Any way to
> debug this from grpc-java? Is there any way to log how much time DNS takes?
>
> According to the ChannelZ service, the connection is closed after the idle
> timeout (configured to 60 seconds on the client, 5 minutes on the server). 
> keepAliveWithoutCalls
> is activated and keep alive packets are sent (tested with tcpdump), but the
> idle configuration and closes the channel. Is it supposed to do this? I
> would have expected keep alive packet to the connection alive...
>
>
> On Thu, Nov 22, 2018 at 3:19 PM tommy via grpc.io <
> [email protected]> wrote:
>
>> 5 seconds smells like a DNS timeout.
>> I don't know why that would affect issuing an RPC on an already
>> established connection thou. I assume you are not closing the channel
>> between calls?
>>
>> On Wednesday, November 21, 2018 at 4:54:22 PM UTC+1, Alexandru Keszeg
>> wrote:
>>>
>>> >  Are the client and server on different machines over a WAN?
>>>
>>> Different machines running on Google Kubernetes service in two
>>> availability zones.
>>>
>>> > Are you certain the server side isn’t blocking? I would jstack the
>>> server as soon as a delay is detected during the test. I would also run a
>>> constant ping from client to server during the test to make sure there are
>>> no network failures.
>>>
>>> I am not running a test, I am trying to diagnose a problem in
>>> production. Randomly, 1 request in hundreds takes a few seconds.
>>>
>>>
>>> *To rephrase my question*
>>> I'm trying to find out what happened between 35ms and 5.07s. The log
>>> shows that the server receives the message after 5 seconds. Is there any
>>> way to debug this? Can I log how many requests are waiting for a
>>> connection?
>>>
>>>
>>> Here is the trace form both sides. The logs are from
>>> ClientTracingInterceptor
>>> https://github.com/grpc-ecosystem/grpc-opentracing/blob/master/java/src/main/java/io/opentracing/contrib/ClientTracingInterceptor.java#L180
>>>
>>> Client:
>>>
>>> Logs (6)
>>> 33ms:
>>>
>>>    - event=Started call
>>>
>>> 35ms:
>>>
>>>    - event=Message sent
>>>
>>> 35ms:
>>>
>>>    - event=Finished sending messages
>>>
>>> 5.08s:
>>>
>>>    - Response headers received=
>>>    
>>> Metadata(content-type=application/grpc,grpc-encoding=identity,grpc-accept-encoding=gzip)
>>>
>>> 5.08s:
>>>
>>>    - event=Response received
>>>
>>> 5.08s:
>>>
>>>    - event=Call closed
>>>
>>>
>>>
>>> Server:
>>>
>>> Logs (3)
>>> 5.07s:
>>>
>>>    - Message received=accountIds:
>>>    "facd3373-ffda-465b-8d32-50da88a4b3ab" accountIds:
>>>    "f7c878c6-5c76-480b-ae54-d47ad5266344" accountIds:
>>>    "ae6e4599-eb8c-4184-8ded-9317ddb2189d"
>>>
>>> 5.07s:
>>>
>>>    - event=Client finished sending messages
>>>
>>> 5.08s:
>>>
>>>    - event=Call completed
>>>
>>>
>>> There is how the server threads looked at this time(sum for 5 instances):
>>>
>>> [image: Capture.PNG]
>>>
>>>
>>>
>>>
>>> What I would expect to see is something like this, where the time is
>>> clearly spent in the server(and there is a span not shown with a "lock"
>>> operation that takes 5 seconds)
>>>
>>> Client:
>>>
>>> 21ms:
>>>
>>>    - event=Started call
>>>
>>> 22ms:
>>>
>>>    - event=Message sent
>>>
>>> 22ms:
>>>
>>>    - event=Finished sending messages
>>>
>>> 5.52s:
>>>
>>>    - Response headers received=
>>>    
>>> Metadata(content-type=application/grpc,grpc-encoding=identity,grpc-accept-encoding=gzip)
>>>
>>> 5.52s:
>>>
>>>    - event=Response received
>>>
>>> 5.52s:
>>>
>>>    - event=Call closed
>>>
>>>
>>> Server:
>>> 27ms:
>>>
>>>    - Message received=username: "123" accountId: ""
>>>
>>> 27ms:
>>>
>>>    - event=Client finished sending messages
>>>
>>> 5.52s:
>>>
>>>    - event=Call completed
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Nov 21, 2018 at 5:25 PM Robert Engels <[email protected]>
>>> wrote:
>>>
>>>> Are the client and server on different machines over a WAN?
>>>>
>>>> Are you certain the server side isn’t blocking? I would jstack the
>>>> server as soon as a delay is detected during the test. I would also run a
>>>> constant ping from client to server during the test to make sure there are
>>>> no network failures.
>>>>
>>>> Other than that, I think you have to enable the gRPC tracing on both
>>>> sides to isolate where the hang is happening.
>>>>
>>>> On Nov 21, 2018, at 9:18 AM, [email protected] wrote:
>>>>
>>>>
>>>>        final NettyChannelBuilder channelBuilder = NettyChannelBuilder.
>>>> forAddress(getHost(), getPort())
>>>>                 .usePlaintext(grpcProperties.isUsePlainText())
>>>>                 .loadBalancerFactory(RoundRobinLoadBalancerFactory.
>>>> getInstance())
>>>>                 .intercept(getClientInterceptors());
>>>>
>>>>         addConnectionPersistenceConfig(channelBuilder);
>>>>
>>>>
>>>>         if (grpcProperties.isEnableClientFixedConcurrency()) {
>>>>             channelBuilder.executor(Executors.newFixedThreadPool(
>>>> grpcProperties.getClientThreadNumber()));
>>>>         }
>>>>
>>>>
>>>>         this.channel = channelBuilder.build();
>>>> ...
>>>>
>>>>     private void addConnectionPersistenceConfig(final
>>>> NettyChannelBuilder channelBuilder) {
>>>>         if (grpcProperties.getClientKeepAlive() != 0) {
>>>>             channelBuilder
>>>>                     .keepAliveTime(grpcProperties.getClientKeepAlive(),
>>>> SECONDS) // 5
>>>>                     .keepAliveWithoutCalls(grpcProperties.
>>>> isClientKeepAliveWithoutCalls()) //true
>>>>                     .keepAliveTimeout(grpcProperties.
>>>> getClientKeepAliveTimeout(), SECONDS); //60
>>>>         }
>>>>
>>>>         if (grpcProperties.getClientIdle() != 0) {
>>>>             channelBuilder.idleTimeout(grpcProperties.getClientIdle(),
>>>> SECONDS); //60
>>>>         }
>>>>     }
>>>>
>>>> I have added the relevant bits of code that build the client, i think
>>>> it should reuse the connection.
>>>>
>>>>
>>>>
>>>>
>>>> Another log from a client:
>>>>
>>>> 112ms:
>>>>
>>>>    - event=Started call
>>>>
>>>> 113ms:
>>>>
>>>>    - event=Message sent
>>>>
>>>> 113ms:
>>>>
>>>>    - event=Finished sending messages
>>>>
>>>> 5.16s:
>>>>
>>>>    - Response headers received=
>>>>    
>>>> Metadata(content-type=application/grpc,grpc-encoding=identity,grpc-accept-encoding=gzip)
>>>>
>>>> 5.16s:
>>>>
>>>>    - event=Response received
>>>>
>>>> 5.16s:
>>>>
>>>>    - event=Call closed
>>>>
>>>>
>>>>
>>>> On the server side, there is no request that took more than 50ms.
>>>>
>>>>
>>>> Regarding file descriptors, both the client and server have about 100
>>>> open file descriptors.
>>>>
>>>>
>>>> On Wednesday, November 21, 2018 at 4:19:59 PM UTC+2, Robert Engels
>>>> wrote:
>>>>>
>>>>> There are also ways to abort the connection to avoid the close delay.
>>>>>
>>>>> On Nov 21, 2018, at 8:18 AM, Robert Engels <[email protected]>
>>>>> wrote:
>>>>>
>>>>> It could be a wait for tcp connection.  If you are continually
>>>>> creating new connections, the server will run out of file descriptors 
>>>>> since
>>>>> some connections will remain in a close wait state - so it has to wait for
>>>>> these to finally close in order to make a new connection.
>>>>>
>>>>> You might want to make sure your test is reusing connections.
>>>>>
>>>>> On Nov 21, 2018, at 8:15 AM, Alexandru Keszeg <[email protected]>
>>>>> wrote:
>>>>>
>>>>> That was my first thought as well, but monitoring doesn't show any
>>>>> long GC pauses.
>>>>>
>>>>> What seems odd is that I have not seen a "pause" between two query
>>>>> traces(check the attached image in the first post), only at the "start" of
>>>>> a request.
>>>>>
>>>>> On Wed, Nov 21, 2018 at 3:34 PM Robert Engels <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Maybe a full GC is occurring on the server ? That’s what I would look
>>>>>> for.
>>>>>>
>>>>>> On Nov 21, 2018, at 2:50 AM, [email protected] wrote:
>>>>>>
>>>>>> Randomly, some gRPC calls which usually complete in 20 milliseconds
>>>>>> take a few seconds.
>>>>>> We have Jager in place, traces show a few seconds where the call does
>>>>>> nothing and then processing begins, which seems to imply queuing of some
>>>>>> sorts?
>>>>>>
>>>>>> On the server side, we have fixed concurrency, a thread dump shows
>>>>>> most of them idle.
>>>>>> Our environment: Kuberentes 1.9 on Google Cloud, services are exposed
>>>>>> using ClusterIP: None, clients connect using DNS load balancing
>>>>>>
>>>>>> - Is there some build-in queuing on the server side?
>>>>>> - Is there any way to track the queue depth?
>>>>>> - Any other tips on debugging this?
>>>>>>
>>>>>> <Selection_120.png>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "grpc.io" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at https://groups.google.com/group/grpc-io.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>> <Selection_120.png>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> ____________________________________
>>>>>
>>>>>
>>>>> <http://pitechplus.com>
>>>>>
>>>>> Alexandru Keszeg
>>>>>
>>>>> Developer
>>>>>
>>>>> +40 747124216
>>>>>
>>>>> Coratim Business Center
>>>>>
>>>>> Campul Painii nr.3-5
>>>>>
>>>>> Cluj Napoca ROMANIA
>>>>>
>>>>> This message (including any attachment(s)) may be copyright-protected
>>>>> and/or contain privileged and confidential information intended for use by
>>>>> the above-mentioned recipient only.  If you are not the intended recipient
>>>>> of this message, then please inform the sender immediately via the
>>>>> telephone number, fax number or e-mail address indicated above and 
>>>>> promptly
>>>>> delete this message from your system.  Any unauthorized copying, 
>>>>> disclosure
>>>>> to third parties or use of this message (including any attachment(s)) is
>>>>> strictly prohibited.  It is generally accepted that the security of
>>>>> electronic communications is not failsafe. Despite our best efforts, we
>>>>> cannot guarantee that electronic communications received were in fact sent
>>>>> by the purported sender and we shall not be liable for the improper or
>>>>> incomplete transmission of the information contained in this 
>>>>> communication,
>>>>> nor for any delay in its receipt or damage to your system.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "grpc.io" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/grpc-io.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/grpc-io/CACekFaEPNG6ZhXEZgyY-YwyNSE2V-aAgcsnCGcvN_nb13ovipg%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/grpc-io/CACekFaEPNG6ZhXEZgyY-YwyNSE2V-aAgcsnCGcvN_nb13ovipg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "grpc.io" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/grpc-io.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/grpc-io/FFD79299-249F-47F3-88D8-3251B378B023%40earthlink.net
>>>>> <https://groups.google.com/d/msgid/grpc-io/FFD79299-249F-47F3-88D8-3251B378B023%40earthlink.net?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "grpc.io" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/grpc-io.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/grpc-io/d3ae5c74-98db-430f-b750-7676170f407c%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/grpc-io/d3ae5c74-98db-430f-b750-7676170f407c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "grpc.io" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/grpc-io/D2lsL82beCs/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/grpc-io.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/grpc-io/092D7CAA-6C58-44C9-8194-E0AA9C75BC41%40earthlink.net
>>>> <https://groups.google.com/d/msgid/grpc-io/092D7CAA-6C58-44C9-8194-E0AA9C75BC41%40earthlink.net?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> --
>>> ____________________________________
>>>
>>>
>>> <http://pitechplus.com>
>>>
>>> Alexandru Keszeg
>>>
>>> Developer
>>>
>>> +40 747124216
>>>
>>> Coratim Business Center
>>>
>>> Campul Painii nr.3-5
>>>
>>> Cluj Napoca ROMANIA
>>>
>>> This message (including any attachment(s)) may be copyright-protected
>>> and/or contain privileged and confidential information intended for use by
>>> the above-mentioned recipient only.  If you are not the intended recipient
>>> of this message, then please inform the sender immediately via the
>>> telephone number, fax number or e-mail address indicated above and promptly
>>> delete this message from your system.  Any unauthorized copying, disclosure
>>> to third parties or use of this message (including any attachment(s)) is
>>> strictly prohibited.  It is generally accepted that the security of
>>> electronic communications is not failsafe. Despite our best efforts, we
>>> cannot guarantee that electronic communications received were in fact sent
>>> by the purported sender and we shall not be liable for the improper or
>>> incomplete transmission of the information contained in this communication,
>>> nor for any delay in its receipt or damage to your system.
>>>
>>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "grpc.io" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/grpc-io/D2lsL82beCs/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/grpc-io.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/grpc-io/c82aa2b5-7965-44be-9e19-00430b5a82f7%40googlegroups.com
>> <https://groups.google.com/d/msgid/grpc-io/c82aa2b5-7965-44be-9e19-00430b5a82f7%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> ____________________________________
>
>
> <http://pitechplus.com>
>
> Alexandru Keszeg
>
> Developer
>
> +40 747124216
>
> Coratim Business Center
>
> Campul Painii nr.3-5
>
> Cluj Napoca ROMANIA
>
> This message (including any attachment(s)) may be copyright-protected
> and/or contain privileged and confidential information intended for use by
> the above-mentioned recipient only.  If you are not the intended recipient
> of this message, then please inform the sender immediately via the
> telephone number, fax number or e-mail address indicated above and promptly
> delete this message from your system.  Any unauthorized copying, disclosure
> to third parties or use of this message (including any attachment(s)) is
> strictly prohibited.  It is generally accepted that the security of
> electronic communications is not failsafe. Despite our best efforts, we
> cannot guarantee that electronic communications received were in fact sent
> by the purported sender and we shall not be liable for the improper or
> incomplete transmission of the information contained in this communication,
> nor for any delay in its receipt or damage to your system.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CANHF7cfLR67tGpEUb5HcZgm8DTktydOj6k_Ythxun3KqsjnTww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to