Do you have ASSERTions turned on. This could help identify a problem earlier.
I have seen other similar kinds of problems (with assertions turned on), but have not spent any time trying to resolve them yet. Assertions such as: eCos panic: cyg_fp_free() will close fd eCos panic: sbdrop eCos panic: soaccept: !NOFDREF eCos panic: sofree: not queued eCos panic: soclose: NOFDREF eCos panic: Locking mutex I already own eCos panic: sbflush: locked I did a search of the eCos archives for these panics, and came accross this (http://sourceware.org/ml/ecos-discuss/2000-04/msg00153.html) I could find no evidence that this issue was ever addressed. Jay -----Original Message----- From: Ross Wightman [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 16, 2005 12:45 AM To: ecos-discuss@ecos.sourceware.org Subject: [ECOS] Possible sockets/fd race condition Myself and another engineer I am working with have recently encountered what appears to be some sort of sockets (BSD Stack) or fd related race condition. We have two threads relevant to the problem. One of them is sending out small (~64byte) UDP multicast packets at a configurable rate of 1-100 per second. The other thread is running a basic single threaded http server that serves one connection at a time (but serves many small requests quite rapidly). Running by themselves either thread can send out hundreds of thousands of UDP packets or serve millions of http requests with no problems. However, as soon as the two threads are both running they last for anywhere from a few hundred to a few thousand served requests before the UDP thread dies a horrible death trying to close its socket. The exact crash occurs when tcp_usr_detach in tcp_usrreq.c tries to dereference a NULL pointer (which should be pointing to a tcp control block). This is very bizarre because we are supposed to be closing a UDP socket. If we look at the socket itself a few stack frames down it is a non active TCP socket with the SS_NOFDREF and SS_INCOMP flags set and no TCP control block. It is certainly not an active UDP socket as it should be. Every time the crash has occurred, the http server is waiting in unlock_inner from an splx call via various tcp calls. The Socket/File Descriptor # for the UDP socket and the TCP socket at the time of the crash is 4 and 5 respectively. We have yet to pinpoint when exactly things start to go wrong. We have however looked for possible race conditions and we noticed that the socreate function does not have any splnet protection like most of the other functions at that level. So we added that and after half a day running the problem has not occurred again. However it's not clear if we fixed the problem in a correct manner or smashed it with a blunt hammer and masked the real cause. Any thoughts would be much appreciated. More details follow below. Regards, Ross This is the pseudo code sequence of the UDP thread. { socket s; loop { wait on flag() s=socket(...); setsocketopt(s, ...); ioctl(s,.... ) //for address change check sendto(s, ....) close(s) } } The http server thread is running a typical TCP serving loop and shares nothing with the UDP thread. The only interaction between the two threads is via any locking or data sharing that occurs in the tcp/ip stack. The relevant portion of the UDP thread stack is as follows: #0 0x001f01b0 in tcp_disconnect (tp=0x0) at /cygdrive/d/ecos/ecos-cvs /packages/net/bsd_tcpip/current/src/sys/netinet/tcp_usrreq.c:1070 #1 0x001eedb4 in tcp_usr_detach (so=0x2cc010) at /cygdrive/d/ecos/ecos-cvs/packages/net/bsd_tcpip/current/src/sys/netinet/tcp _usrreq.c:181 #2 0x001c33ec in cyg_soclose (so=0x2cc010) at /cygdrive/d/ecos/ecos-cvs/packages/net/bsd_tcpip/current/src/sys/kern/uipc_s ocket.c:321 #3 0x0016f244 in bsd_close (fp=0x2b99ec) at /cygdrive/d/ecos/ecos-cvs/packages/net/bsd_tcpip/current/src/sys/kern/sockio .c:730 #4 0x0019c17c in fp_ucount_dec (fp=0x2b99ec) at /cygdrive/d/ecos/ecos-cvs/packages/io/fileio/current/src/fd.cxx:160 #5 0x0019c284 in fd_close (fd=5) at /cygdrive/d/ecos/ecos-cvs/packages/io/fileio/current/src/fd.cxx:189 #6 0x0019c55c in cyg_fd_free (fd=5) at /cygdrive/d/ecos/ecos-cvs/packages/io/fileio/current/src/fd.cxx:260 #7 0x0019f8ac in close (fd=5) at /cygdrive/d/ecos/ecos-cvs/packages/io/fileio/current/src/io.cxx:216 #8 0x000524b0 in Packet::PacketSend() (this=0x32d5a0) at src/App/Packet.cpp:141 The print out of socket (so) is: This should be a UDP socket but instead it's an inactive TCP socket. (gdb) print/x *(struct socket*)(0x2cc010) $2 = {so_zone = 0x2cc180, so_type = 0x1, so_options = 0x20, so_linger = 0x0, so_state = 0x801, so_pcb = 0x2cb478, so_proto = 0x2a51a0, so_head = 0x2cc0c0, so_incomp = {tqh_first = 0x0, tqh_last = 0x0}, so_comp = {tqh_first = 0x0, tqh_last = 0x0}, so_list = {tqe_next = 0x0, tqe_prev = 0x2cc0d8}, so_qlen = 0x0, so_incqlen = 0x0, so_qlimit = 0x0, so_timeo = 0x0, so_error = 0x0, so_sigio = 0x0, so_oobmark = 0x0, so_aiojobq = {tqh_first = 0x0, tqh_last = 0x2cc054}, so_rcv = {sb_cc = 0x0, sb_hiwat = 0x4470, sb_mbcnt = 0x0, sb_mbmax = 0x22380, sb_lowat = 0x1, sb_mb = 0x0, sb_sel = {si_info = 0x0, si_waitFlag = 0x0}, sb_flags = 0x0, sb_timeo = 0x0}, so_snd = {sb_cc = 0x40, sb_hiwat = 0x4470, sb_mbcnt = 0x80, sb_mbmax = 0x22380, sb_lowat = 0x800, sb_mb = 0x2ccf80, sb_sel = {si_info = 0x0, si_waitFlag = 0x0}, sb_flags = 0x0, sb_timeo = 0x0}, so_upcall = 0x0, so_upcallarg = 0x0, so_cred = 0x0, cyg_so_gencnt = 0x8646, so_emuldata = 0x0, so_accf = 0x0} The print out of inpcb (so_pcb) is: (gdb) print/x *(struct inpcb*)(0x2cb478) $3 = {inp_hash = {le_next = 0x0, le_prev = 0x0}, inp_fport = 0x0, inp_lport = 0x0, inp_list = {le_next = 0x0, le_prev = 0x2a6e18}, inp_flow = 0x0, inp_dependfaddr = {inp46_foreign = {ia46_pad32 = {0x0, 0x0, 0x0}, ia46_addr4 = {s_addr = 0x0}}, inp6_foreign = {__u6_addr = { __u6_addr8 = {0x0 <repeats 16 times>}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __u6_addr32 = {0x0, 0x0, 0x0, 0x0}}}}, inp_dependladdr = {inp46_local = {ia46_pad32 = {0x0, 0x0, 0x0}, ia46_addr4 = {s_addr = 0x0}}, inp6_local = {__u6_addr = {__u6_addr8 = { 0x0 <repeats 16 times>}, __u6_addr16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __u6_addr32 = {0x0, 0x0, 0x0, 0x0}}}}, inp_ppcb = 0x0, inp_pcbinfo = 0x308310, inp_socket = 0x2cc010, inp_flags = 0x0, inp_dependroute = {inp4_route = {ro_rt = 0x0, ro_u = { rou_dst_ss = {ss_len = 0x0, ss_family = 0x0, __ss_pad1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __ss_align = 0x0, __ss_pad2 = { 0x0 <repeats 112 times>}}, ro_dst_sa = {sa_len = 0x0, sa_family = 0x0, sa_data = {0x0 <repeats 30 times>}}}}, inp6_route = {ro_rt = 0x0, ro_u = {rou_dst_ss = {ss_len = 0x0, ss_family = 0x0, __ss_pad1 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __ss_align = 0x0, __ss_pad2 = {0x0 <repeats 112 times>}}, ro_dst_sa = {sa_len = 0x0, sa_family = 0x0, sa_data = { 0x0 <repeats 30 times>}}}}}, inp_sp = 0x0, inp_vflag = 0x1, inp_ip_ttl = 0x40, inp_ip_p = 0x0, inp_depend4 = {inp4_ip_tos = 0x0, inp4_options = 0x0, inp4_moptions = 0x0}, inp_depend6 = { inp6_options = 0x0, inp6_inputopts = 0x0, inp6_outputopts = 0x0, inp6_moptions = 0x0, inp6_icmp6filt = 0x0, inp6_cksum = 0x0, inp6_ifindex = 0x0, inp6_hops = 0x0}, inp_portlist = {le_next = 0x0, le_prev = 0x0}, inp_phd = 0x0, inp_gencnt = 0x8d9d} The relevant portion of the HTTP serving thread is: #0 Cyg_Scheduler::unlock_inner(unsigned) (new_lock=0) at /cygdrive/d/ecos/ecos-cvs/packages/kernel/current/src/sched/sched.cxx:220 #1 0x00266858 in Cyg_Scheduler::unlock() () at /cygdrive/d/work/firmware/micro/ecos/conf/avppc1/ecos/install/include/cyg/ke rnel/sched.inl:100 #2 0x001868cc in Cyg_Mutex::unlock() (this=0x2ff580) at /cygdrive/d/ecos/ecos-cvs/packages/kernel/current/src/sync/mutex.cxx:445 #3 0x0017e470 in cyg_mutex_unlock (mutex=0x2ff580) at /cygdrive/d/ecos/ecos-cvs/packages/kernel/current/src/common/kapi.cxx:1081 #4 0x001a6048 in cyg_splx (old_state=0) at /cygdrive/d/ecos/ecos-cvs/packages/net/bsd_tcpip/current/src/ecos/synch.c:20 9 #5 0x001c43d8 in cyg_sosend (so=0x2cbeb0, addr=0x0, uio=0x2b8310, top=0x2cce80, control=0x0, flags=0, p=0x0) at /cygdrive/d/ecos/ecos-cvs/packages/net/bsd_tcpip/current/src/sys/kern/uipc_s ocket.c:622 #6 0x0016f9dc in bsd_sendit (fp=0x2b99cc, mp=0x2b83b0, flags=0, retsize=0x2b83a0) at /cygdrive/d/ecos/ecos-cvs/packages/net/bsd_tcpip/current/src/sys/kern/sockio .c:1039 #7 0x0016eb64 in bsd_sendmsg (fp=0x2b99cc, m=0x2b83b0, flags=0, retsize=0x2b83a0) at /cygdrive/d/ecos/ecos-cvs/packages/net/bsd_tcpip/current/src/sys/kern/sockio .c:572 #8 0x001a2450 in sendto (s=4, buf=0x32d7fc, len=543, flags=0, to=0x0, tolen=0) at /cygdrive/d/ecos/ecos-cvs/packages/io/fileio/current/src/socket.cxx:612 #9 0x001a251c in send (s=4, buf=0x32d7fc, len=543, flags=0) at /cygdrive/d/ecos/ecos-cvs/packages/io/fileio/current/src/socket.cxx:629 #10 0x00143694 in fsend (pData=0x32d5e8, len=543) at src/Interface/serv.cpp:518 -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss