I recently spent a bunch of time with a customer who was having trouble
connecting to our appliance from z/OS. They were getting error 410, "SSL
message format is incorrect". curl was failing too, and it doesn't even use
System SSL.

 

After much tinkering, looking at PCAPs, tracing on z/OS, etc., someone said
something about AT-TLS. "Wait, what? There's no AT-TLS involved here." "Yes
there is, we have it on all connections."

 

Well, there's yer problem, Vern-our product on z/OS was setting up an https
connection using GSK (System SSL), or curl was using OpenSSL. Those requests
would start their way out to the network, and then AT-TLS would grab them
and start its own negotiation. So what we'd see in Wireshark was
approximately:

1.      Mainframe starts handshake
2.      Server (actually a gateway, but that doesn't matter) does its
handshake thing
3.      Certificates, ciphers, keys exchanged
4.      Mainframe says 410 and drops connection

 

Since this of course worked fine for us, we were baffled until we realized
AT-TLS was involved: z/OS sent out a Client Hello, and then AT-TLS got in
there and the response from the gateway was NOT the expected Server Hello!

 

In retrospect, the fact that curl was also failing MIGHT have been a clue,
but at the time we took it as evidence that the problem was outside of z/OS.
Instead, it appears the sequence was:

product<=>GSK<=>PAGENT<=>AT-TLS<=>TCP/IP<=>network<=>gateway

and

curl<=>OpenSSL<=>PAGENT<=>AT-TLS<=>TCP/IP<=>network<=>gateway

 

AT-TLS is cool, but not when you didn't ask for it. I had assumed that it
was integrated into GSK and/or TCP/IP such that this scenario would be
impossible. If it were, then presumably a gsk_environment_init() would keep
AT-TLS from kicking in, or cause a meaningful error. Not blaming IBM-this is
a user error, and I made an assumption that, while plausible, just isn't
correct.

 

The 410 "SSL message format is incorrect" was baffling; even IBM Level 2 was
stymied, since they didn't know about the stacked protocols. And apparently
whatever tracing they got from the customer didn't show it. This again makes
sense, since only one layer of TLS ever actually got established. I wonder
whether a sharp eye might find two gsk_environment_init() calls for one
connection, but can hardly blame them, since that isn't anywhere near where
the error was reported!

 

So why am I telling you this? In case it helps someone else save some
tsuris. Googling

410 "SSL message format is incorrect"

only gets 13 hits; add "AT-TLS" and that drops to 5, none of which are about
this "stacking" issue. So this is not a commonly encountered problem.

 

...phsiii


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to