Re: [grpc-io] grpc-java throwing unavailable exception after ~2 billion requests

alan . yw . lee Wed, 26 Jul 2017 19:54:47 -0700

I encountered this problem twice, runtime exception just like the following.
The first time is still working with Cloud Speech API Beta, system running 
about three weeks and about ten thousands requests.
The second time is working Cloud Speech API GA only less than one hours and 
about 100 requests. 
(Exactly the same system already
Both time system can only be recovered by application restart.
I doubt this issue might not need 2 billion requests to make it happen.
And is there any more elegant way to recover other than kill application 
and restart it.

2017-07-27 08:30:19,794 [DEBUG] [r-ELG-49-2] 
verification of certificate failed
java.lang.RuntimeException: Unexpected error: java.security.
InvalidAlgorithmParameterException: the trustAnchors parameter must be non-
empty
 at sun.security.validator.PKIXValidator.<init>(Unknown Source)
 at sun.security.validator.Validator.getInstance(Unknown Source)
 at sun.security.ssl.X509TrustManagerImpl.getValidator(Unknown Source)
 at sun.security.ssl.X509TrustManagerImpl.checkTrustedInit(Unknown Source)
 at sun.security.ssl.X509TrustManagerImpl.checkTrusted(Unknown Source)
 at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(Unknown Source)
 at io.netty.handler.ssl.
ReferenceCountedOpenSslClientContext$ExtendedTrustManagerVerifyCallback.
verify(ReferenceCountedOpenSslClientContext.java:223)
 at io.netty.handler.ssl.
ReferenceCountedOpenSslContext$AbstractCertificateVerifier.verify(
ReferenceCountedOpenSslContext.java:606)
 at org.apache.tomcat.jni.SSL.readFromSSL(Native Method)
 at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.readPlaintextData(
ReferenceCountedOpenSslEngine.java:470)
 at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(
ReferenceCountedOpenSslEngine.java:927)
 at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(
ReferenceCountedOpenSslEngine.java:1033)
 at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:
200)
 at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1117)
 at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1039)
 at io.netty.handler.codec.ByteToMessageDecoder.callDecode(
ByteToMessageDecoder.java:411)
 at io.netty.handler.codec.ByteToMessageDecoder.channelRead(
ByteToMessageDecoder.java:248)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
AbstractChannelHandlerContext.java:363)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
AbstractChannelHandlerContext.java:349)
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
AbstractChannelHandlerContext.java:341)
 at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(
DefaultChannelPipeline.java:1334)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
AbstractChannelHandlerContext.java:363)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
AbstractChannelHandlerContext.java:349)
 at io.netty.channel.DefaultChannelPipeline.fireChannelRead(
DefaultChannelPipeline.java:926)
 at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(
AbstractNioByteChannel.java:129)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:
642)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(
NioEventLoop.java:565)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:
479)
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441)
 at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(
SingleThreadEventExecutor.java:858)
 at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.
run(DefaultThreadFactory.java:144)
 at java.lang.Thread.run(Unknown Source)
Caused by: java.security.InvalidAlgorithmParameterException: the 
trustAnchors parameter must be non-empty
 at java.security.cert.PKIXParameters.setTrustAnchors(Unknown Source)
 at java.security.cert.PKIXParameters.<init>(Unknown Source)
 at java.security.cert.PKIXBuilderParameters.<init>(Unknown Source)
 ... 32 more

and

java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: 
UNAVAILABLE: Channel closed while performing protocol negotiation

Eric Anderson於 2016年5月7日星期六 UTC+8上午12時25分48秒寫道：
>
> On Fri, May 6, 2016 at 1:49 AM, Erik Gorset <[email protected] 
> <javascript:>> wrote:
>
>> It looks like grpc-java is calling netty's incrementAndGetNextStreamId 
>> [0] which returns int. Does this really mean that grpc only supports 2^31 
>> requests per channel?
>>
>
> As Josh said, the limit is per transport, not per channel. There is 
> already code intended to swap to a new transport, but maybe it is 
> buggy/suboptimal.
>
> I’m happy to create a github issue if this can be seen as a bug and not a 
>> known limitation.
>>
>
> Please make an issue. This is a bug.
>
> UNAVAILABLE: Stream IDs have been exhausted
>>
>
> That status appears to be coming from here 
> <https://github.com/grpc/grpc-java/blob/c6faf3541ba5b41d0a7441528d2d788279a73a6a/netty/src/main/java/io/grpc/netty/NettyClientHandler.java#L526>.
>  
> The behavior 
> <https://github.com/grpc/grpc-java/blob/c6faf3541ba5b41d0a7441528d2d788279a73a6a/netty/src/main/java/io/grpc/netty/NettyClientHandler.java#L337>
>  
> then seems to be that that particular RPC will fail but future RPCs should 
> start going to a new transport. That alone is suboptimal but not too bad; a 
> transient failure of 1 out of 2^30 RPCs should be recoverable by 
> applications, otherwise they are probably going to have a bad time from 
> other failures. However, it won't necessarily be only 1 RPC that fails, 
> since it will take a small amount of time to divert traffic to a new 
> transport, and all RPCs during that time would fail. It'd be good to 
> address that.
>
> However, I think the larger problem is that calling close 
> <https://github.com/grpc/grpc-java/blob/c6faf3541ba5b41d0a7441528d2d788279a73a6a/netty/src/main/java/io/grpc/netty/NettyClientHandler.java#L343>
>  doesn't 
> trigger things quickly enough, especially if you have long-lived streams, 
> since it delays until all the RPCs on that transport are complete. There is 
> no upper-bound on how long a stream could live, so a Channel could be 
> broken for quite some time.
>
> The background for my question is that we had an outage caused by the 
>> limitation
>
>
> If your RPCs are short-lived and my analysis is correct, I wouldn't expect 
> an outage, but instead a temporary failure. Is the lifetime of some of your 
> RPCs long? If so, then I think that would help confirm my theory.
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/53bbdd98-9cec-43c0-b072-0f75aff7c915%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [grpc-io] grpc-java throwing unavailable exception after ~2 billion requests

Reply via email to