On Mon, Aug 27, 2007 at 02:48:42AM -0400, Rick Davis wrote: > I have a device using the MPC859T processor that has a small web server > running using the standard eCos web server. I have a status page that > auto-refreshes every 15 seconds and I am pinging the unit every second (Yes, > I have a customer that is actually doing this). I don't really know what > other network activity is occurring at the customer's site but my test lab > has Windows network chatter going on. After about 12 or so hours the web > stops responding and the unit can no longer be pinged. The FEC Ethernet > driver is receiving packets and is calling the eth_drv_dsr but the deliver > function is never called. > > I have been tracking this down for some time and have noticed the > following... > > 1. The alarm thread in timeout.c is getting blocked when calling > splx_internal() just before the call to eth_drv_run_deliveries(). > 2. The current value of spl_state in sync.c is 4 (SPL_NET) > > Any ideas why the network would not release the splx_mutex? > Any suggestion on how to further track this down? > I don't have a GDB interface on my platform. :(
What vintage of eCos are you using? If you go back far enough into the mists of time, there was at least one bug fix for alarms. But that is a long time ago. Do you have asserts enabled? It might give some clues..... You could also enable CYGIMPL_TRACE_SPLX and call show_sched_events() when you hit the deadlock. That should tell you what function is holding the mutex. You might want to add to the log structure __builtin_return_addresss(0), so you can see one more level up the call stack. Otherwise i think you will just get spi_slpnet, which is not much use. Andrew -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss