Hi Honza,
Thanks for the reply! pthread_create() returns ENOMEM. It happend only
during my resource stress test.
On Jun 5, 2013 11:16 PM, "Jan Friesse" <[email protected]> wrote:
> Jason,
> again thanks for your very detailed analysis. What is pthread_create
> returning? EAGAIN? If so, I would probably go way "Instead of solving
> problem, just don't allow problem to appear". In other words, I would
> check res = pthread_create, and if it's EAGAIN, just clear conn_info
> variables like ref_count, state, private_data, ... and call
> ipc_disconnect. What do you think about that?
>
> Honza
>
> jason napsal(a):
> > Hi All,
> >
> > My enviroment(corosync-1.4.5) encountered a segmentation fault at the
> > following place.
> >
> > (gdb) bt
> > #0 0x004f9012 in pthread_join () from /lib/libpthread.so.0
> > #1 0x00ba6956 in conn_info_destroy (fd=15, revent=17, context=0x8dd78a0)
> > at coroipcs.c:503
> > #2 coroipcs_handler_dispatch (fd=15, revent=17, context=0x8dd78a0)
> > at coroipcs.c:1617
> > #3 0x0804c63b in corosync_poll_handler_dispatch (
> > handle=150346236434579456, fd=15, revent=17, context=0x8dd78a0)
> > at main.c:1105
> > #4 0x00d7e994 in poll_run (handle=150346236434579456) at coropoll.c:513
> > #5 0x0804d697 in main (argc=2, argv=0xbfd7ad54, envp=0xbfd7ad60)
> > at main.c:1874
> > (gdb) f 1
> > #1 0x00ba6956 in conn_info_destroy (fd=15, revent=17, context=0x8dd78a0)
> > at coroipcs.c:503
> > 503 res = pthread_join (conn_info->thread, &retval);
> > (gdb) p conn_info->thread
> > $1 = 0
> >
> > gdb shows that pthread_join tried to join an ipc consumer which does not
> > exist. The reason I found out is that coroipcs_handler_dispatch() failed
> to
> > create the thread and it did not check the return value of
> pthread_create()
> > which was failed due to out of memory. When this happen, ipc client side
> > saw ipc connection create successfully but all the subsequent ipc
> requests
> > was blocked and never return. So I CTRL+C to quit the client application
> to
> > close the ipc connection at the client side. At this time, server side
> > calls pthread_join and got the segmentation fault.
> >
> > The solution to the segmentation fault is simply checking if
> > conn_info->thread is zero conn_info_destroy(), if it is, then,we should
> > omit to call pthread_join() and decrease ipc's refcount (which increased
> in
> > coroipcs_handler_dispatch()).
> >
> > So I changed the conn_info_destroy() code to the following:
> >
> > if (conn_info->state == CONN_STATE_THREAD_REQUEST_EXIT) {
> > if (0 != conn_info->thread) {
> > res = pthread_join (conn_info->thread, &retval);
> > } else {
> > coroipcs_refcount_dec (conn_info);
> > }
> > conn_info->state = CONN_STATE_THREAD_DESTROYED;
> > return (0);
> > }
> >
> >
> >
> > But this solution is useless for the client ipc blocking problem, because
> > when the above code returns 0 to coropoll.c, it will get no chance for
> > coroipcs_handler_dispatch to be called again.
> >
> > Any ideas?
> >
> >
> >
> >
> > _______________________________________________
> > discuss mailing list
> > [email protected]
> > http://lists.corosync.org/mailman/listinfo/discuss
> >
>
>
_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss