John Watson created FLINK-38904:
-----------------------------------
Summary: MySQL CDC binlog reader hangs due to TLS 1.3 KeyUpdate
deadlock (potentially JDK-8241239)
Key: FLINK-38904
URL: https://issues.apache.org/jira/browse/FLINK-38904
Project: Flink
Issue Type: Bug
Components: Flink CDC
Environment: * JDK 11.0.18 (TLS 1.3 enabled by default)
* MySQL on AWS RDS with SSL required
* ~137GB data processed
Reporter: John Watson
The MySQL CDC binlog reader deadlocks after processing ~137GB data when using
TLS 1.3. This appears to be caused by JDK bug
[JDK-8241239|https://bugs.openjdk.org/browse/JDK-8241239] where TLS 1.3's
KeyUpdate mechanism triggers a deadlock in SSLSocketImpl.
TLS 1.3 sends KeyUpdate messages after ~137GB of data transfer (AES-GCM nonce
limit). The deadlock occurs as follows:
* Reader thread receives KeyUpdate, must respond by writing new keys
* Reader thread holds SSL lock, blocks in native {{socketWrite0()}}
* Keepalive thread detects timeout, attempts to close connection
* {{SSLSocketImpl.closeNotify()}} requires the same SSL lock
* Deadlock: Reader holds lock waiting on network I/O; Keepalive waiting for
lock
Thread Dump:
{code:java}
Thread: blc-...:3306 (id=113)
State: RUNNABLE (blocked in native socketWrite0)
Holds: ReentrantLock@753cff5d
Stack:
java.net.SocketOutputStream.socketWrite0(Native Method)
sun.security.ssl.SSLSocketOutputRecord.flush()
sun.security.ssl.OutputRecord.changeWriteCiphers()
sun.security.ssl.KeyUpdate$KeyUpdateProducer.produce()
sun.security.ssl.SSLSocketImpl.tryKeyUpdate()
sun.security.ssl.SSLSocketImpl.decode()
sun.security.ssl.SSLSocketImpl.readApplicationRecord()
...
Thread: blc-keepalive-...:3306 (id=115)
State: WAITING
Waiting on: ReentrantLock@753cff5d
Lock owner: Thread 113
Stack:
java.util.concurrent.locks.ReentrantLock.lock()
sun.security.ssl.SSLSocketImpl.closeNotify()
sun.security.ssl.TransportContext.closeNotify()
sun.security.ssl.SSLSocketImpl.shutdownOutput()
com.github.shyiko.mysql.binlog.network.protocol.PacketChannel.close()
com.github.shyiko.mysql.binlog.BinaryLogClient.disconnectChannel()
com.github.shyiko.mysql.binlog.BinaryLogClient.terminateConnect()
... {code}
+Steps to Reproduce+
Configure MySQL CDC with SSL enabled ({{requireSSL=true}}) against AWS Aurora
Use JDK 11 (TLS 1.3 enabled by default)
Process high-volume CDC workload (>137GB)
Observe binlog reader thread deadlock
+Expected Behavior+
Binlog reader continues processing indefinitely without deadlocking.
+Actual Behavior+
Binlog reader deadlocks after ~137GB data transfer when TLS 1.3 KeyUpdate is
triggered. The reader thread holds the SSL lock while blocked in
{{socketWrite0()}}, and the keepalive thread blocks forever waiting for the
same lock to send {{close_notify}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)