At least in the Java world, connections are treated as a queue of handlers, 
with TLS near the network side.  Any TLS failures are not surfaced (or not 
easily anyways) since they never make it up to the gRPC layer.  Gather 
these stats server side would require some code changes.  I know you said 
you are using python, but I think every language would be able to benefit 
from knowing about such failures.



On Thursday, May 24, 2018 at 7:32:08 PM UTC-7, eagle wrote:
>
> Hi all, 
>
> I want to gather statistics on how many failed TLS handshakes are 
> happening across a large gRPC deployment.  (The motivation is a long 
> story, but basically I want to be able to trigger alerts if something goes 
> wrong with the PKI that's generating the client and server certificates.) 
> Just a count of failed TLS handshakes would be sufficient, although if I 
> can get at more detailed errors, that might be helpful.  I see how to do 
> this in the Go gRPC code, but also need the same support for Python. 
>
> I'd mildly prefer to capture this information on the server, although it 
> may be adequate to capture it on the client if that's easier to do.  (I 
> only need either the server or the client, not both.) 
>
> Could someone point me in the right direction?  I can take a cut at 
> implementing this as a general feature suitable for a pull request, but 
> I'm not sure the best approach to use given the code structure. 
>
> What I've been able to determine (I think) so far: 
>
> * Actual success or failure of the TLS handshakes is determined (for both 
>   client and server) in src/core/tsi/ssl_transport_security.cc. 
>
> * For the server, these errors pass up to on_handshake_done in 
>   src/core/ext/transport/chttp2/server/chttp2_server.cc, where they're 
>   logged and discarded and the channel is closed.  This seems to be below 
>   the layer where the Python bindings have any visibility.  (In other 
>   words, so far as I could determine, no Python code ever sees the error 
>   from an attempted connection that results in a failed TLS handshake.) 
>
> * For the client, it looks like (?) these errors will trigger a notify 
>   closure in src/core/ext/transport/chttp2/client/chttp2_connector.cc in 
>   on_handshake_done, which in turn seems to be on_subchannel_connected in 
>   core/ext/filters/client_channel/subchannel.cc, but it looks like the 
>   exact contents of the error don't go any farther beyond that function. 
>
> Therefore, unless I'm mistaken, it looks like these errors are swallowed 
> inside the core code in places where I can't get visibility to them from 
> Python, hiding entirely (except for a log message) in the server and 
> turning into a generic connection state inside the client channels.  So 
> there seems to be some plumbing or a hook missing here to be able to 
> bubble these failures up to a level where I can get at them and send them 
> to monitoring code. 
>
> -- 
> Russ Allbery ([email protected] <javascript:>)              <
> http://www.eyrie.org/~eagle/> 
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/9263ef53-08b2-472f-a92b-03df568c40b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to