[ 
https://issues.apache.org/jira/browse/PROTON-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373111#comment-16373111
 ] 

Alan Conway commented on PROTON-1770:
-------------------------------------

Fixed for epoll and libuv, see commits for PROTON-1766.

Equivalent fix needed for windows. Summary of changes:

 No longer use pn_connection_attachments to find the back-pointer from 
pn_connection_t to the proactors data structure.
 
Instead use pn_conection_driver_ptr to get a pointer to the driver and 
offsetof() to find the start of the complete proactor data structure that it is 
embedded in.
 
Since pn_connection_driver_ptr is only called by the proactor, all uses can be 
locked. Epoll and libuv currently use a global mutex, but an atomic load/store 
with acquire/release memory barriers would do. Not done immediately for lack of 
portable C atomics, but dispatch has some we can steal in future. I think 
windows already has these.
 
For epoll I made one other change - we no longer use pn_refcounts to manage the 
pconnection lifecycle. The original reason was to leave the pconnection in 
place even after the connection was no longer owned by the proactor, so 
pn_connection_wake would not crash. Thats no longer required since this change 
fixes the pn_connection_wake race even if we disassociate a connection from its 
proactor state.

> CLONE - [cpp] win_iocp fix for seg fault in reconnect test
> ----------------------------------------------------------
>
>                 Key: PROTON-1770
>                 URL: https://issues.apache.org/jira/browse/PROTON-1770
>             Project: Qpid Proton
>          Issue Type: Improvement
>          Components: cpp-binding, proton-c
>    Affects Versions: proton-c-0.20.0
>            Reporter: Alan Conway
>            Assignee: Cliff Jansen
>            Priority: Major
>             Fix For: proton-c-0.21.0
>
>
> See [https://issues.jboss.org/browse/ENTMQCL-600] for details and reproducer 
> code, summary:
>  
> Using the to be attached reproducer and broker configuration:
> Running amqsender
> ./amqsender <broker1> <broker2> <username> <password> <address> <frequency in 
> microseconds>
> e.g.
> ./amqsender testbox111:5672 testbox111:5673 anon anon Q1 1
> You can reproduce the coredump with just one broker
> 1. keep slave down
> 2. start master broker
> 3. run amqsender with a very low frequency
> 4. kill master broker
> This should reproduce the coredump.
> The reproducer has events implemented for on_transport_close yet we see:
> {code}
> .
> .
> .
> [0x7fffec0169b0]:(PN_TRANSPORT, pn_transport<0x7fffec0169b0>)
> [0x7fffec0169b0]:(PN_TRANSPORT, pn_transport<0x7fffec0169b0>)
> [0x7fffec0169b0]:(PN_CONNECTION_WAKE, pn_connection<0x7fffec000b90>)
> AMQSender::on_connection_wake pn_connection<0x7fffec000b90>
> [0x7fffec0169b0]:(PN_TRANSPORT_TAIL_CLOSED, pn_transport<0x7fffec0169b0>)
> [0x7fffec0169b0]:(PN_TRANSPORT_ERROR, pn_transport<0x7fffec0169b0>)
> [0x7fffec0169b0]:(PN_TRANSPORT_HEAD_CLOSED, pn_transport<0x7fffec0169b0>)
> [0x7fffec0169b0]:(PN_TRANSPORT_CLOSED, pn_transport<0x7fffec0169b0>)
> [0x7fffec0169b0]:(PN_CONNECTION_INIT, pn_connection<0x7fffec000b90>)
> Thread 1 "amqsender" received signal SIGSEGV, Segmentation fault.
> 0x00007ffff72bcdd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> Missing separate debuginfos, use: dnf debuginfo-install 
> cyrus-sasl-gssapi-2.1.26-26.2.fc24.x86_64 
> cyrus-sasl-lib-2.1.26-26.2.fc24.x86_64 cyrus-sasl-md5-2.1.26-26.2.fc24.x86_64 
> cyrus-sasl-plain-2.1.26-26.2.fc24.x86_64 
> cyrus-sasl-scram-2.1.26-26.2.fc24.x86_64 keyutils-libs-1.5.9-8.fc24.x86_64 
> krb5-libs-1.14.4-7.fc25.x86_64 libcom_err-1.43.3-1.fc25.x86_64 
> libcrypt-nss-2.24-4.fc25.x86_64 libdb-5.3.28-16.fc25.x86_64 
> libgcc-6.3.1-1.fc25.x86_64 libselinux-2.5-13.fc25.x86_64 
> libstdc++-6.3.1-1.fc25.x86_64 nss-softokn-freebl-3.28.3-1.1.fc25.x86_64 
> openssl-libs-1.0.2k-1.fc25.x86_64 pcre-8.40-5.fc25.x86_64 
> zlib-1.2.8-10.fc24.x86_64
> (gdb) bt
> #0  0x00007ffff72bcdd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #1  0x00007ffff76dc4fa in lock (m=0x1a0) at 
> /home/rkieley/LocalProjects/src/rh/rh-qpid-proton/proton-c/src/proactor/epoll.c:112
> #2  0x00007ffff76dcc09 in wake (ctx=0x7fffec2b8ac0) at 
> /home/rkieley/LocalProjects/src/rh/rh-qpid-proton/proton-c/src/proactor/epoll.c:436
> #3  0x00007ffff76def0e in pn_connection_wake (c=0x7fffec000b90) at 
> /home/rkieley/LocalProjects/src/rh/rh-qpid-proton/proton-c/src/proactor/epoll.c:1302
> #4  0x00007ffff7b81b82 in proton::container::impl::connection_work_queue::add 
> (this=0x7fffec001d30, f=...) at 
> /home/rkieley/LocalProjects/src/rh/rh-qpid-proton/proton-c/bindings/cpp/src/proactor_container_impl.cpp:118
> #5  0x00007ffff7bacde5 in proton::work_queue::add (this=0x7fffec001cd8, 
> f=...) at 
> /home/rkieley/LocalProjects/src/rh/rh-qpid-proton/proton-c/bindings/cpp/src/work_queue.cpp:43
> #6  0x000000000040536f in AMQSender::send (this=this@entry=0x7fffffffd960, 
> strMsg="7578") at ../attachments/AMQSender.cpp:42
> #7  0x000000000040328f in main (argc=<optimized out>, argv=0x7fffffffdbd8) at 
> ../attachments/amqsend.cpp:20
> (gdb) frame 2
> #2  0x00007ffff76dcc09 in wake (ctx=0x7fffec2b8ac0) at 
> /home/rkieley/LocalProjects/src/rh/rh-qpid-proton/proton-c/src/proactor/epoll.c:436
> 436           lock(&p->eventfd_mutex);
> (gdb) print p
> $3 = (pn_proactor_t *) 0x0
> (gdb)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to