[ 
https://issues.apache.org/jira/browse/FLINK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052363#comment-18052363
 ] 

John Watson commented on FLINK-38904:
-------------------------------------

This issue can be worked around using the fix suggested in FLINK-38934. 

> MySQL CDC binlog reader hangs due to TLS 1.3 KeyUpdate deadlock (potentially 
> JDK-8241239)
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-38904
>                 URL: https://issues.apache.org/jira/browse/FLINK-38904
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>         Environment: * JDK 11.0.18 (TLS 1.3 enabled by default)
>  * MySQL on AWS RDS with SSL required
>  * ~137GB data processed
>            Reporter: John Watson
>            Priority: Major
>
>  The MySQL CDC binlog reader deadlocks after processing ~137GB data when 
> using TLS 1.3. This appears to be caused by JDK bug 
> [JDK-8241239|https://bugs.openjdk.org/browse/JDK-8241239] where TLS 1.3's 
> KeyUpdate mechanism triggers a deadlock in SSLSocketImpl.
>  
> TLS 1.3 sends KeyUpdate messages after ~137GB of data transfer (AES-GCM nonce 
> limit). The deadlock occurs as follows:
>  * Reader thread receives KeyUpdate, must respond by writing new keys
>  * Reader thread holds SSL lock, blocks in native {{socketWrite0()}}
>  * Keepalive thread detects timeout, attempts to close connection
>  * {{SSLSocketImpl.closeNotify()}} requires the same SSL lock
>  * Deadlock: Reader holds lock waiting on network I/O; Keepalive waiting for 
> lock
> Thread Dump:
> {code:java}
>   Thread: blc-...:3306 (id=113)
>     State: RUNNABLE (blocked in native socketWrite0)
>     Holds: ReentrantLock@753cff5d
>     Stack:
>       java.net.SocketOutputStream.socketWrite0(Native Method)
>       sun.security.ssl.SSLSocketOutputRecord.flush()
>       sun.security.ssl.OutputRecord.changeWriteCiphers()
>       sun.security.ssl.KeyUpdate$KeyUpdateProducer.produce()
>       sun.security.ssl.SSLSocketImpl.tryKeyUpdate()
>       sun.security.ssl.SSLSocketImpl.decode()
>       sun.security.ssl.SSLSocketImpl.readApplicationRecord()
>       ...
>   Thread: blc-keepalive-...:3306 (id=115)
>     State: WAITING
>     Waiting on: ReentrantLock@753cff5d                          
>     Lock owner: Thread 113                                      
>     Stack:
>       java.util.concurrent.locks.ReentrantLock.lock()
>       sun.security.ssl.SSLSocketImpl.closeNotify()              
>       sun.security.ssl.TransportContext.closeNotify()
>       sun.security.ssl.SSLSocketImpl.shutdownOutput()
>       com.github.shyiko.mysql.binlog.network.protocol.PacketChannel.close()
>       com.github.shyiko.mysql.binlog.BinaryLogClient.disconnectChannel()
>       com.github.shyiko.mysql.binlog.BinaryLogClient.terminateConnect()
>       ...  {code}
>  
> +Steps to Reproduce+
> Configure MySQL CDC with SSL enabled ({{requireSSL=true}}) against AWS Aurora
> Use JDK 11 (TLS 1.3 enabled by default)
> Process high-volume CDC workload (>137GB)
> Observe binlog reader thread deadlock
> +Expected Behavior+
> Binlog reader continues processing indefinitely without deadlocking.
> +Actual Behavior+
> Binlog reader deadlocks after ~137GB data transfer when TLS 1.3 KeyUpdate is 
> triggered. The reader thread holds the SSL lock while blocked in 
> {{socketWrite0()}}, and the keepalive thread blocks forever waiting for the 
> same lock to send {{close_notify}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to