Hi all, I haven't written in a while, but I've come up against something very
unusual....
We recently switched over to mico 2.3.13 with some home-grown multi-threaded
queue handlers. Everything was working grand, until we had one service hang.
We killed it and everything worked fine again for a week - same thing happened
again. Then some other services started experiencing the same problems.
As a quick fix, we put in a timer set at 0.5 hours, and if no corba functions
are called, then the service kills itself. This works fine because we use the
imr, so when a client asks mico where the service is, the mico daemon
automatically starts up the service. This fixed one of the services and we
haven't had any problems since.
However, the original service is still stuck - so after much debugging and
rooting around in the mico code, we came up with an unusual situation. The
service receives the corba header telling it how long the incoming message is
(the header is 12 bytes, and the incoming message is approximately 8300 bytes).
The socket is read numerous times to get the entire 8300 bytes, but after
about 5200 bytes, the reading stops. This wouldn't be so bad, but all the
threads in the service all stop receiving messages (None of them respond to any
new messages coming in). The homegrown queue thread is still running and
watching for any new messages, but no other messages come in.
The strange part, is if I use "lsof" and compare the file descriptor with the
debugger file descriptor, I see that the socket connection is "ESTABLISHED".
But, if I go to the other machine (that's sending the message) I don't see any
socket connection - no fin wait, no closed, nothing! So the service will never
receive the last 3100 bytes.
I've tried duplicating this in a test environment and I've hammered the service
with no problems. The service resides on an HPux 11i and the client normally
resides on an SGI. I have done all my testing with the client on HPux 11i and
linux, but never SGI (that's coming next). Anybody know of inherent socket
issues with SGI or HPux 11i to SGI?
I can give specific debug info and lines of code if anyone is interested. Any
help is appreciated!
Thanks,Mark
------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management
Up to 160% more powerful than alternatives and 25% more efficient.
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
Mico-devel mailing list
Mico-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mico-devel