Hi 'Milind Changire' ,
The leak is getting more and more clear to me now. the unsolved memory leak is
because of in gluterfs version 3.12.15 (in my env)the ssl context is a shared
one, while we do ssl_acept, ssl will allocate some read/write buffer to ssl
object, however, ssl_free in socket_reset or fini function of socket.c, the
buffer is returened back to ssl context free list instead of completely freed.
So following patch is able to fix the memory leak issue completely.(created for
gluster master branch)
--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -446,6 +446,7 @@ ssl_setup_connection_postfix(rpc_transport_t *this)
gf_log(this->name, GF_LOG_DEBUG,
"SSL verification succeeded (client: %s) (server: %s)",
this->peerinfo.identifier, this->myinfo.identifier);
+ X509_free(peer);
return gf_strdup(peer_CN);
/* Error paths. */
@@ -1157,7 +1158,21 @@ __socket_reset(rpc_transport_t *this)
memset(&priv->incoming, 0, sizeof(priv->incoming));
event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
-
+ if(priv->use_ssl&& priv->ssl_ssl)
+ {
+ gf_log(this->name, GF_LOG_TRACE,
+ "clear and reset for socket(%d), free ssl ",
+ priv->sock);
+ if(priv->ssl_ctx)
+ {
+ SSL_CTX_free(priv->ssl_ctx);
+ priv->ssl_ctx = NULL;
+ }
+ SSL_shutdown(priv->ssl_ssl);
+ SSL_clear(priv->ssl_ssl);
+ SSL_free(priv->ssl_ssl);
+ priv->ssl_ssl = NULL;
+ }
priv->sock = -1;
priv->idx = -1;
priv->connected = -1;
@@ -4675,6 +4690,21 @@ fini(rpc_transport_t *this)
pthread_mutex_destroy(&priv->out_lock);
pthread_mutex_destroy(&priv->cond_lock);
pthread_cond_destroy(&priv->cond);
+ if(priv->use_ssl&& priv->ssl_ssl)
+ {
+ gf_log(this->name, GF_LOG_TRACE,
+ "clear and reset for socket(%d), free ssl ",
+ priv->sock);
+ if(priv->ssl_ctx)
+ {
+ SSL_CTX_free(priv->ssl_ctx);
+ priv->ssl_ctx = NULL;
+ }
+ SSL_shutdown(priv->ssl_ssl);
+ SSL_clear(priv->ssl_ssl);
+ SSL_free(priv->ssl_ssl);
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Monday, May 06, 2019 2:12 PM
To: 'Amar Tumballi Suryanarayan' <[email protected]>
Cc: 'Milind Changire' <[email protected]>; '[email protected]'
<[email protected]>
Subject: RE: [Gluster-devel] glusterfsd memory leak issue found after enable ssl
Hi,
From our test valgrind and libleak all blame ssl3_accept
///////////////////////////from valgrind attached to
glusterfds///////////////////////////////////////////
==16673== 198,720 bytes in 12 blocks are definitely lost in loss record 1,114
of 1,123
==16673== at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
==16673== by 0x63E1977: CRYPTO_malloc (in /usr/lib64/libcrypto.so.1.0.2p)
==16673== by 0xA855E0C: ssl3_setup_write_buffer (in
/usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA855E77: ssl3_setup_buffers (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA8485D9: ssl3_accept (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA610DDF: ssl_complete_connection (socket.c:400)
==16673== by 0xA617F38: ssl_handle_server_connection_attempt (socket.c:2409)
==16673== by 0xA618420: socket_complete_connection (socket.c:2554)
==16673== by 0xA618788: socket_event_handler (socket.c:2613)
==16673== by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
==16673== by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
==16673== by 0x615C5D9: start_thread (in /usr/lib64/libpthread-2.27.so)
==16673==
==16673== 200,544 bytes in 12 blocks are definitely lost in loss record 1,115
of 1,123
==16673== at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
==16673== by 0x63E1977: CRYPTO_malloc (in /usr/lib64/libcrypto.so.1.0.2p)
==16673== by 0xA855D12: ssl3_setup_read_buffer (in
/usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA855E68: ssl3_setup_buffers (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA8485D9: ssl3_accept (in /usr/lib64/libssl.so.1.0.2p)
==16673== by 0xA610DDF: ssl_complete_connection (socket.c:400)
==16673== by 0xA617F38: ssl_handle_server_connection_attempt (socket.c:2409)
==16673== by 0xA618420: socket_complete_connection (socket.c:2554)
==16673== by 0xA618788: socket_event_handler (socket.c:2613)
==16673== by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
==16673== by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
==16673== by 0x615C5D9: start_thread (in /usr/lib64/libpthread-2.27.so)
==16673==
valgrind --leak-check=f
////////////////////////////////////with libleak attached to
glusterfsd/////////////////////////////////////////
callstack[2419] expires. count=1 size=224/224 alloc=362 free=350
/home/robot/libleak/libleak.so(malloc+0x25) [0x7f1460604065]
/lib64/libcrypto.so.10(CRYPTO_malloc+0x58) [0x7f145ecd9978]
/lib64/libcrypto.so.10(EVP_DigestInit_ex+0x2a9) [0x7f145ed95749]
/lib64/libssl.so.10(ssl3_digest_cached_records+0x11d) [0x7f145abb6ced]
/lib64/libssl.so.10(ssl3_accept+0xc8f) [0x7f145abadc4f]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(ssl_complete_connection+0x5e)
[0x7f145ae00f3a]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc16d)
[0x7f145ae0816d]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc68a)
[0x7f145ae0868a]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc9f2)
[0x7f145ae089f2]
/lib64/libglusterfs.so.0(+0x9b96f) [0x7f146038596f]
/lib64/libglusterfs.so.0(+0x9bc46) [0x7f1460385c46]
/lib64/libpthread.so.0(+0x75da) [0x7f145f0d15da]
/lib64/libc.so.6(clone+0x3f) [0x7f145e9a7eaf]
callstack[2432] expires. count=1 size=104/104 alloc=362 free=0
/home/robot/libleak/libleak.so(malloc+0x25) [0x7f1460604065]
/lib64/libcrypto.so.10(CRYPTO_malloc+0x58) [0x7f145ecd9978]
/lib64/libcrypto.so.10(BN_MONT_CTX_new+0x17) [0x7f145ed48627]
/lib64/libcrypto.so.10(BN_MONT_CTX_set_locked+0x6d) [0x7f145ed489fd]
/lib64/libcrypto.so.10(+0xff4d9) [0x7f145ed6a4d9]
/lib64/libcrypto.so.10(int_rsa_verify+0x1cd) [0x7f145ed6d41d]
/lib64/libcrypto.so.10(RSA_verify+0x32) [0x7f145ed6d972]
/lib64/libcrypto.so.10(+0x107ff5) [0x7f145ed72ff5]
/lib64/libcrypto.so.10(EVP_VerifyFinal+0x211) [0x7f145ed9dd51]
/lib64/libssl.so.10(ssl3_get_cert_verify+0x5bb) [0x7f145abac06b]
/lib64/libssl.so.10(ssl3_accept+0x988) [0x7f145abad948]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(ssl_complete_connection+0x5e)
[0x7f145ae00f3a]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc16d)
[0x7f145ae0816d]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc68a)
[0x7f145ae0868a]
/usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc9f2)
[0x7f145ae089f2]
/lib64/libglusterfs.so.0(+0x9b96f) [0x7f146038596f]
/lib64/libglusterfs.so.0(+0x9bc46) [0x7f1460385c46]
/lib64/libpthread.so.0(+0x75da) [0x7f145f0d15da]
/lib64/libc.so.6(clone+0x3f) [0x7f145e9a7eaf]
one interesting thing is that the memory goes up to about 300m then it stopped
increasing !!!
I am wondering if this is caused by open-ssl library? But when I search from
openssl community, there is no such issue reported before.
Is glusterfs using ssl_accept correctly?
cynthia
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Monday, May 06, 2019 10:34 AM
To: 'Amar Tumballi Suryanarayan'
<[email protected]<mailto:[email protected]>>
Cc: Milind Changire <[email protected]<mailto:[email protected]>>;
[email protected]<mailto:[email protected]>
Subject: RE: [Gluster-devel] glusterfsd memory leak issue found after enable ssl
Hi,
Sorry, I am so busy with other issues these days, could you help me to submit
my patch for review? It is based on glusterfs3.12.15 code. But even with this
patch , memory leak still exists, from memory leak tool it should be related
with ssl_accept, not sure if it is because of openssl library or because
improper use of ssl interfaces.
--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -1019,7 +1019,16 @@ static void __socket_reset(rpc_transport_t *this) {
memset(&priv->incoming, 0, sizeof(priv->incoming));
event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
-
+ if(priv->use_ssl&& priv->ssl_ssl)
+ {
+ gf_log(this->name, GF_LOG_INFO,
+ "clear and reset for socket(%d), free ssl ",
+ priv->sock);
+ SSL_shutdown(priv->ssl_ssl);
+ SSL_clear(priv->ssl_ssl);
+ SSL_free(priv->ssl_ssl);
+ priv->ssl_ssl = NULL;
+ }
priv->sock = -1;
priv->idx = -1;
priv->connected = -1;
@@ -4238,6 +4250,16 @@ void fini(rpc_transport_t *this) {
pthread_mutex_destroy(&priv->out_lock);
pthread_mutex_destroy(&priv->cond_lock);
pthread_cond_destroy(&priv->cond);
+ if(priv->use_ssl&& priv->ssl_ssl)
+ {
+ gf_log(this->name, GF_LOG_INFO,
+ "clear and reset for socket(%d), free ssl ",
+ priv->sock);
+ SSL_shutdown(priv->ssl_ssl);
+ SSL_clear(priv->ssl_ssl);
+ SSL_free(priv->ssl_ssl);
+ priv->ssl_ssl = NULL;
+ }
if (priv->ssl_private_key) {
GF_FREE(priv->ssl_private_key);
}
From: Amar Tumballi Suryanarayan
<[email protected]<mailto:[email protected]>>
Sent: Wednesday, May 01, 2019 8:43 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou)
<[email protected]<mailto:[email protected]>>
Cc: Milind Changire <[email protected]<mailto:[email protected]>>;
[email protected]<mailto:[email protected]>
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl
Hi Cynthia Zhou,
Can you post the patch which fixes the issue of missing free? We will continue
to investigate the leak further, but would really appreciate getting the patch
which is already worked on land into upstream master.
-Amar
On Mon, Apr 22, 2019 at 1:38 PM Zhou, Cynthia (NSB - CN/Hangzhou)
<[email protected]<mailto:[email protected]>> wrote:
Ok, I am clear now.
I’ve added ssl_free in socket reset and socket finish function, though
glusterfsd memory leak is not that much, still it is leaking, from source code
I can not find anything else,
Could you help to check if this issue exists in your env? If not I may have a
try to merge your patch .
Step
1> while true;do gluster v heal <vol-name> info,
2> check the vol-name glusterfsd memory usage, it is obviously increasing.
cynthia
From: Milind Changire <[email protected]<mailto:[email protected]>>
Sent: Monday, April 22, 2019 2:36 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou)
<[email protected]<mailto:[email protected]>>
Cc: Atin Mukherjee <[email protected]<mailto:[email protected]>>;
[email protected]<mailto:[email protected]>
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl
According to BIO_new_socket() man page ...
If the close flag is set then the socket is shut down and closed when the BIO
is freed.
For Gluster to have more control over the socket shutdown, the BIO_NOCLOSE flag
is set. Otherwise, SSL takes control of socket shutdown whenever BIO is freed.
_______________________________________________
Gluster-devel mailing list
[email protected]<mailto:[email protected]>
https://lists.gluster.org/mailman/listinfo/gluster-devel
--
Amar Tumballi (amarts)
_______________________________________________
Community Meeting Calendar:
APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017
NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655
Gluster-devel mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-devel