qianye1001 commented on issue #10395: URL: https://github.com/apache/rocketmq/issues/10395#issuecomment-4560468682
## 🤖 Automated Fix Proposal I've analyzed this issue against the latest upstream code and generated a fix specification. ### Summary **Root Cause**: When TLS certificates are dynamically reloaded via `FileWatchService`, the `loadSslContext()` methods in `NettyRemotingServer` and `ProxyAndTlsProtocolNegotiator` directly overwrite the `sslContext` field without releasing the old instance. Since `ReferenceCountedOpenSslContext` allocates native off-heap memory (SSL_CTX, X509 chain, EVP_PKEY), this native memory is leaked on every certificate rotation cycle. **Fix Strategy**: 1. Save the old `SslContext` reference before building the replacement 2. After successful assignment of the new context, call `ReferenceCountUtil.release(oldSslContext)` to free native memory 3. Use "build new, then release old" ordering to ensure `sslContext` is never null or prematurely released **Files to Modify**: - `remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingServer.java` - `proxy/src/main/java/org/apache/rocketmq/proxy/grpc/ProxyAndTlsProtocolNegotiator.java` **Safety**: - `ReferenceCountUtil.release()` is safe for JDK provider `SslContext` (no-op for non-ReferenceCounted) - In-flight connections using the old context are unaffected (refCount managed by Netty channel pipeline) - Added `volatile` to `sslContext` field in `ProxyAndTlsProtocolNegotiator` for thread safety <details> <summary>Full Specification (click to expand)</summary> ```markdown # Fix Specification: Issue #10395 - Dynamic Certificate Reload Native Memory Leak ## 1. Root Cause Analysis When TLS certificates are dynamically reloaded, `loadSslContext()` in both `NettyRemotingServer` and `ProxyAndTlsProtocolNegotiator` directly overwrites the `sslContext` field with a newly built `SslContext` instance. The previous `SslContext` is never released. Netty's `SslContext` (particularly when backed by `SslProvider.OPENSSL` / `SslProvider.OPENSSL_REFCNT`) implements `ReferenceCounted`. Each instance holds native OpenSSL structures (SSL_CTX, X509, EVP_PKEY, etc.) allocated outside the JVM heap. When the old reference is discarded without calling `release()`, these native allocations are leaked permanently — the GC cannot reclaim them because they live in native memory managed by reference counting. Over time, repeated certificate reloads (e.g., every few minutes via a file-watcher or scheduled task) cause unbounded native memory growth, eventually leading to OOM kills or container evictions. **Affected locations:** | File | Method | Line | |------|--------|------| | `remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingServer.java` | `loadSslContext()` | 186 | | `proxy/src/main/java/org/apache/rocketmq/proxy/grpc/ProxyAndTlsProtocolNegotiator.java` | `loadSslContext()` | 118, 131 | **Field declarations:** - `NettyRemotingAbstract.java:121` — `protected volatile SslContext sslContext;` - `ProxyAndTlsProtocolNegotiator.java:80` — `private static SslContext sslContext;` ## 2. Fix Strategy ### Step 1: Save Old Reference Before Replacement Before building the new `SslContext`, capture the current value of `sslContext` into a local variable. ### Step 2: Build and Assign New SslContext Build the new `SslContext` and assign it to the field (existing behavior, unchanged). ### Step 3: Release Old SslContext After the new context is successfully assigned, check if the old reference is non-null and implements `ReferenceCounted`, th ``` </details> ### Next Steps - Reply `/approve` to proceed with the PR review process - Reply `/revise <feedback>` to request changes to the fix approach - Reply `/reject` to close this proposal **Note**: A PR (#10396) was submitted prematurely without community approval. If this fix direction is acceptable, please `/approve` here. If not, we can close that PR and revise the approach based on your feedback. *This proposal will expire in 72 hours if no response is received.* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
