[ https://issues.apache.org/jira/browse/KAFKA-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426334#comment-17426334 ]
David Mao commented on KAFKA-13360: ----------------------------------- Very thorough writeup, nice find! > Wrong SSL messages when handshake fails > --------------------------------------- > > Key: KAFKA-13360 > URL: https://issues.apache.org/jira/browse/KAFKA-13360 > Project: Kafka > Issue Type: Bug > Components: network > Affects Versions: 2.8.0 > Environment: Two VMs, one running one Kafka broker and the other one > running kafka-console-consumer.sh. > The consumer is validating the server certificate. > Both VMs are VirtualBox running in the same laptop. > Using internal LAN. > Latency is in the order of microseconds. > More details in attached PDF. > Reporter: Rodolfo Kohn > Priority: Major > Attachments: Kafka error.pdf, > dump_192.168.56.101_192.168.56.102_32776_9093_2021_10_06_21_09_19.pcap, > ssl_kafka_error_logs_match_ssl_logs.txt, > ssl_kafka_error_logs_match_ssl_logs2.txt > > > When a consumer tries to connect to a Kafka broker and there is an error in > the SSL handshake, like the server sending a certificate that cannot be > validated for not matching the common name with the server/domain name, Kafka > sends out erroneous SSL messages before sending an SSL alert. This error > occurs in client but also can be seen in server. > Because of the nature of the problem it seems it will happen in more if not > all handshake errors. > I've debugged and analyzed the Kafka networking code in > org.apache.kafka.common.network and wrote a detailed description of how the > error occurs. > Attaching the pcap file and a pdf with the detailed description of where the > error is in the networking code (SslTransportLayer, Channel, Selector). > I executed a very basic test between kafka-console-consumer and a simple > installation of one Kafka broker with TLS. > The test consisted on a Kafka broker with a certificate that didn’t match the > domain name I used to identify the server. The CA was well set up to avoid > related problems, like unknown CA error code. Thus, when the server sends the > certificate to the client, the handshake fails with code error 46 > (certificate unknown). The goal was that my tool would detect the issue and > send an event, describing a TLS handshake problem for both processes. > However, I noticed the tool sent what I thought it was the wrong event, it > sent a TLS exception event for an unexpected message instead of an event for > TLS alert for certificate unknown. > I noticed that during handshake, after the client receives Sever Hello, > Certificate, Server Key Exchange, and Server Hello Done, it sends out the > same Client Hello it sent at the beginning and then 3 more records with all > zeroes, in two more messages. It sent a total of 16,709 Bytes including the > 289 Bytes of Client Hello record. > > This looks also like a design error regarding how protocol failures are > handled. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)