It might be a red herring but there is reported issues of conntrack races causing DNS to timeout with Kubernetes. https://github.com/kubernetes/kubernetes/issues/56903 https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
this wouldn't be visible on the kube-dns side but from the container issuing DNS queries. DNS resolution would be nice to have visible in traces but I don't think that can be exposed as a ClientInterceptor. On Fri, Nov 23, 2018 at 10:41 AM Alexandru Keszeg <[email protected]> wrote: > Monitoring on kube-dns, reports a request rate of only about 2req/s. I've > also run tcpdump on the kube-dns pods and I haven't managed to catch any > problems. I'm not 100%, but DNS doesn't seem to be an issue. Any way to > debug this from grpc-java? Is there any way to log how much time DNS takes? > > According to the ChannelZ service, the connection is closed after the idle > timeout (configured to 60 seconds on the client, 5 minutes on the server). > keepAliveWithoutCalls > is activated and keep alive packets are sent (tested with tcpdump), but the > idle configuration and closes the channel. Is it supposed to do this? I > would have expected keep alive packet to the connection alive... > > > On Thu, Nov 22, 2018 at 3:19 PM tommy via grpc.io < > [email protected]> wrote: > >> 5 seconds smells like a DNS timeout. >> I don't know why that would affect issuing an RPC on an already >> established connection thou. I assume you are not closing the channel >> between calls? >> >> On Wednesday, November 21, 2018 at 4:54:22 PM UTC+1, Alexandru Keszeg >> wrote: >>> >>> > Are the client and server on different machines over a WAN? >>> >>> Different machines running on Google Kubernetes service in two >>> availability zones. >>> >>> > Are you certain the server side isn’t blocking? I would jstack the >>> server as soon as a delay is detected during the test. I would also run a >>> constant ping from client to server during the test to make sure there are >>> no network failures. >>> >>> I am not running a test, I am trying to diagnose a problem in >>> production. Randomly, 1 request in hundreds takes a few seconds. >>> >>> >>> *To rephrase my question* >>> I'm trying to find out what happened between 35ms and 5.07s. The log >>> shows that the server receives the message after 5 seconds. Is there any >>> way to debug this? Can I log how many requests are waiting for a >>> connection? >>> >>> >>> Here is the trace form both sides. The logs are from >>> ClientTracingInterceptor >>> https://github.com/grpc-ecosystem/grpc-opentracing/blob/master/java/src/main/java/io/opentracing/contrib/ClientTracingInterceptor.java#L180 >>> >>> Client: >>> >>> Logs (6) >>> 33ms: >>> >>> - event=Started call >>> >>> 35ms: >>> >>> - event=Message sent >>> >>> 35ms: >>> >>> - event=Finished sending messages >>> >>> 5.08s: >>> >>> - Response headers received= >>> >>> Metadata(content-type=application/grpc,grpc-encoding=identity,grpc-accept-encoding=gzip) >>> >>> 5.08s: >>> >>> - event=Response received >>> >>> 5.08s: >>> >>> - event=Call closed >>> >>> >>> >>> Server: >>> >>> Logs (3) >>> 5.07s: >>> >>> - Message received=accountIds: >>> "facd3373-ffda-465b-8d32-50da88a4b3ab" accountIds: >>> "f7c878c6-5c76-480b-ae54-d47ad5266344" accountIds: >>> "ae6e4599-eb8c-4184-8ded-9317ddb2189d" >>> >>> 5.07s: >>> >>> - event=Client finished sending messages >>> >>> 5.08s: >>> >>> - event=Call completed >>> >>> >>> There is how the server threads looked at this time(sum for 5 instances): >>> >>> [image: Capture.PNG] >>> >>> >>> >>> >>> What I would expect to see is something like this, where the time is >>> clearly spent in the server(and there is a span not shown with a "lock" >>> operation that takes 5 seconds) >>> >>> Client: >>> >>> 21ms: >>> >>> - event=Started call >>> >>> 22ms: >>> >>> - event=Message sent >>> >>> 22ms: >>> >>> - event=Finished sending messages >>> >>> 5.52s: >>> >>> - Response headers received= >>> >>> Metadata(content-type=application/grpc,grpc-encoding=identity,grpc-accept-encoding=gzip) >>> >>> 5.52s: >>> >>> - event=Response received >>> >>> 5.52s: >>> >>> - event=Call closed >>> >>> >>> Server: >>> 27ms: >>> >>> - Message received=username: "123" accountId: "" >>> >>> 27ms: >>> >>> - event=Client finished sending messages >>> >>> 5.52s: >>> >>> - event=Call completed >>> >>> >>> >>> >>> >>> On Wed, Nov 21, 2018 at 5:25 PM Robert Engels <[email protected]> >>> wrote: >>> >>>> Are the client and server on different machines over a WAN? >>>> >>>> Are you certain the server side isn’t blocking? I would jstack the >>>> server as soon as a delay is detected during the test. I would also run a >>>> constant ping from client to server during the test to make sure there are >>>> no network failures. >>>> >>>> Other than that, I think you have to enable the gRPC tracing on both >>>> sides to isolate where the hang is happening. >>>> >>>> On Nov 21, 2018, at 9:18 AM, [email protected] wrote: >>>> >>>> >>>> final NettyChannelBuilder channelBuilder = NettyChannelBuilder. >>>> forAddress(getHost(), getPort()) >>>> .usePlaintext(grpcProperties.isUsePlainText()) >>>> .loadBalancerFactory(RoundRobinLoadBalancerFactory. >>>> getInstance()) >>>> .intercept(getClientInterceptors()); >>>> >>>> addConnectionPersistenceConfig(channelBuilder); >>>> >>>> >>>> if (grpcProperties.isEnableClientFixedConcurrency()) { >>>> channelBuilder.executor(Executors.newFixedThreadPool( >>>> grpcProperties.getClientThreadNumber())); >>>> } >>>> >>>> >>>> this.channel = channelBuilder.build(); >>>> ... >>>> >>>> private void addConnectionPersistenceConfig(final >>>> NettyChannelBuilder channelBuilder) { >>>> if (grpcProperties.getClientKeepAlive() != 0) { >>>> channelBuilder >>>> .keepAliveTime(grpcProperties.getClientKeepAlive(), >>>> SECONDS) // 5 >>>> .keepAliveWithoutCalls(grpcProperties. >>>> isClientKeepAliveWithoutCalls()) //true >>>> .keepAliveTimeout(grpcProperties. >>>> getClientKeepAliveTimeout(), SECONDS); //60 >>>> } >>>> >>>> if (grpcProperties.getClientIdle() != 0) { >>>> channelBuilder.idleTimeout(grpcProperties.getClientIdle(), >>>> SECONDS); //60 >>>> } >>>> } >>>> >>>> I have added the relevant bits of code that build the client, i think >>>> it should reuse the connection. >>>> >>>> >>>> >>>> >>>> Another log from a client: >>>> >>>> 112ms: >>>> >>>> - event=Started call >>>> >>>> 113ms: >>>> >>>> - event=Message sent >>>> >>>> 113ms: >>>> >>>> - event=Finished sending messages >>>> >>>> 5.16s: >>>> >>>> - Response headers received= >>>> >>>> Metadata(content-type=application/grpc,grpc-encoding=identity,grpc-accept-encoding=gzip) >>>> >>>> 5.16s: >>>> >>>> - event=Response received >>>> >>>> 5.16s: >>>> >>>> - event=Call closed >>>> >>>> >>>> >>>> On the server side, there is no request that took more than 50ms. >>>> >>>> >>>> Regarding file descriptors, both the client and server have about 100 >>>> open file descriptors. >>>> >>>> >>>> On Wednesday, November 21, 2018 at 4:19:59 PM UTC+2, Robert Engels >>>> wrote: >>>>> >>>>> There are also ways to abort the connection to avoid the close delay. >>>>> >>>>> On Nov 21, 2018, at 8:18 AM, Robert Engels <[email protected]> >>>>> wrote: >>>>> >>>>> It could be a wait for tcp connection. If you are continually >>>>> creating new connections, the server will run out of file descriptors >>>>> since >>>>> some connections will remain in a close wait state - so it has to wait for >>>>> these to finally close in order to make a new connection. >>>>> >>>>> You might want to make sure your test is reusing connections. >>>>> >>>>> On Nov 21, 2018, at 8:15 AM, Alexandru Keszeg <[email protected]> >>>>> wrote: >>>>> >>>>> That was my first thought as well, but monitoring doesn't show any >>>>> long GC pauses. >>>>> >>>>> What seems odd is that I have not seen a "pause" between two query >>>>> traces(check the attached image in the first post), only at the "start" of >>>>> a request. >>>>> >>>>> On Wed, Nov 21, 2018 at 3:34 PM Robert Engels <[email protected]> >>>>> wrote: >>>>> >>>>>> Maybe a full GC is occurring on the server ? That’s what I would look >>>>>> for. >>>>>> >>>>>> On Nov 21, 2018, at 2:50 AM, [email protected] wrote: >>>>>> >>>>>> Randomly, some gRPC calls which usually complete in 20 milliseconds >>>>>> take a few seconds. >>>>>> We have Jager in place, traces show a few seconds where the call does >>>>>> nothing and then processing begins, which seems to imply queuing of some >>>>>> sorts? >>>>>> >>>>>> On the server side, we have fixed concurrency, a thread dump shows >>>>>> most of them idle. >>>>>> Our environment: Kuberentes 1.9 on Google Cloud, services are exposed >>>>>> using ClusterIP: None, clients connect using DNS load balancing >>>>>> >>>>>> - Is there some build-in queuing on the server side? >>>>>> - Is there any way to track the queue depth? >>>>>> - Any other tips on debugging this? >>>>>> >>>>>> <Selection_120.png> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "grpc.io" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at https://groups.google.com/group/grpc-io. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/grpc-io/3f8075e1-e261-45c9-865f-23285b98cca9%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> <Selection_120.png> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> ____________________________________ >>>>> >>>>> >>>>> <http://pitechplus.com> >>>>> >>>>> Alexandru Keszeg >>>>> >>>>> Developer >>>>> >>>>> +40 747124216 >>>>> >>>>> Coratim Business Center >>>>> >>>>> Campul Painii nr.3-5 >>>>> >>>>> Cluj Napoca ROMANIA >>>>> >>>>> This message (including any attachment(s)) may be copyright-protected >>>>> and/or contain privileged and confidential information intended for use by >>>>> the above-mentioned recipient only. If you are not the intended recipient >>>>> of this message, then please inform the sender immediately via the >>>>> telephone number, fax number or e-mail address indicated above and >>>>> promptly >>>>> delete this message from your system. Any unauthorized copying, >>>>> disclosure >>>>> to third parties or use of this message (including any attachment(s)) is >>>>> strictly prohibited. It is generally accepted that the security of >>>>> electronic communications is not failsafe. Despite our best efforts, we >>>>> cannot guarantee that electronic communications received were in fact sent >>>>> by the purported sender and we shall not be liable for the improper or >>>>> incomplete transmission of the information contained in this >>>>> communication, >>>>> nor for any delay in its receipt or damage to your system. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "grpc.io" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/grpc-io. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/grpc-io/CACekFaEPNG6ZhXEZgyY-YwyNSE2V-aAgcsnCGcvN_nb13ovipg%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/grpc-io/CACekFaEPNG6ZhXEZgyY-YwyNSE2V-aAgcsnCGcvN_nb13ovipg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "grpc.io" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/grpc-io. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/grpc-io/FFD79299-249F-47F3-88D8-3251B378B023%40earthlink.net >>>>> <https://groups.google.com/d/msgid/grpc-io/FFD79299-249F-47F3-88D8-3251B378B023%40earthlink.net?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "grpc.io" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/grpc-io. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/grpc-io/d3ae5c74-98db-430f-b750-7676170f407c%40googlegroups.com >>>> <https://groups.google.com/d/msgid/grpc-io/d3ae5c74-98db-430f-b750-7676170f407c%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "grpc.io" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/grpc-io/D2lsL82beCs/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/grpc-io. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/grpc-io/092D7CAA-6C58-44C9-8194-E0AA9C75BC41%40earthlink.net >>>> <https://groups.google.com/d/msgid/grpc-io/092D7CAA-6C58-44C9-8194-E0AA9C75BC41%40earthlink.net?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> -- >>> ____________________________________ >>> >>> >>> <http://pitechplus.com> >>> >>> Alexandru Keszeg >>> >>> Developer >>> >>> +40 747124216 >>> >>> Coratim Business Center >>> >>> Campul Painii nr.3-5 >>> >>> Cluj Napoca ROMANIA >>> >>> This message (including any attachment(s)) may be copyright-protected >>> and/or contain privileged and confidential information intended for use by >>> the above-mentioned recipient only. If you are not the intended recipient >>> of this message, then please inform the sender immediately via the >>> telephone number, fax number or e-mail address indicated above and promptly >>> delete this message from your system. Any unauthorized copying, disclosure >>> to third parties or use of this message (including any attachment(s)) is >>> strictly prohibited. It is generally accepted that the security of >>> electronic communications is not failsafe. Despite our best efforts, we >>> cannot guarantee that electronic communications received were in fact sent >>> by the purported sender and we shall not be liable for the improper or >>> incomplete transmission of the information contained in this communication, >>> nor for any delay in its receipt or damage to your system. >>> >>> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "grpc.io" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/grpc-io/D2lsL82beCs/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/grpc-io. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/grpc-io/c82aa2b5-7965-44be-9e19-00430b5a82f7%40googlegroups.com >> <https://groups.google.com/d/msgid/grpc-io/c82aa2b5-7965-44be-9e19-00430b5a82f7%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > ____________________________________ > > > <http://pitechplus.com> > > Alexandru Keszeg > > Developer > > +40 747124216 > > Coratim Business Center > > Campul Painii nr.3-5 > > Cluj Napoca ROMANIA > > This message (including any attachment(s)) may be copyright-protected > and/or contain privileged and confidential information intended for use by > the above-mentioned recipient only. If you are not the intended recipient > of this message, then please inform the sender immediately via the > telephone number, fax number or e-mail address indicated above and promptly > delete this message from your system. Any unauthorized copying, disclosure > to third parties or use of this message (including any attachment(s)) is > strictly prohibited. It is generally accepted that the security of > electronic communications is not failsafe. Despite our best efforts, we > cannot guarantee that electronic communications received were in fact sent > by the purported sender and we shall not be liable for the improper or > incomplete transmission of the information contained in this communication, > nor for any delay in its receipt or damage to your system. > > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CANHF7cfLR67tGpEUb5HcZgm8DTktydOj6k_Ythxun3KqsjnTww%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
