while I havent fixed it yet, this is my current theory on the way to resolve this. I'd appreciate comments.
Symptom: - emc starts a toolchange - toolchange is aborted, eg. by Escape key in Axis - emc freezes for up to five seconds, a log message appears saying like so: emc/task/iotaskintf.cc 155: Command to IO level (EMC_TOOL_ABORT:+1103,+12, +0,) timed out waiting for last command done. emc/task/iotaskintf.cc 158: emcIoStatus->echo_serial_number=10, emcIoCommandSerialNumber=10, emcIoStatus->status=2 emc/task/iotaskintf.cc 163: Last command sent to IO level was (EMC_TOOL_LOAD:+1105,+12, +10,) What I think *should* be happening: - emc starts a toolchange and sends a EMC_TOOL_LOAD to iocontrol and waits for it to be acknowledged before proceeding - iocontrol receives that and starts its tool-change/tool-changed pin protocol with the external toolchanger script - iocontrol should also listen on the toolCmd channel from emc for any other messages - when the toolchange is aborted, e.g. by Escape in Axis or otherwise, emc should immediately queue a EMC_TOOL_ABORT to iocontrol - iocontrol should periodically peek into the queue even if a toolchange is pending. If it sees an EMC_TOOL_ABORT it should clean up, like deassert the tool-change pin, and probably acknowledge both messages so emc continues. - note that this assumes that the queue between emc and iocontrol is a bona-fide queue, i.e. can hold more than one message (the pending TOOL_LOAD and the TOOL_ABORT). What I *think* is happening: - emc starts a toolchange and sends a EMC_TOOL_LOAD to iocontrol - iocontrol receives that and starts its tool-change/tool-changed pin protocol with the external toolchanger script - when the toolchange is aborted, e.g. by Escape in Axis or otherwise, emc should queue a EMC_TOOL_ABORT to iocontrol. However, emc stares at the last EMC_TOOL_LOAD command serial number waiting for it to be acknowledged before it goes on to send the EMC_TOOL_ABORT - that never happens so it's a classic deadlock, which is "resolved" by a timeout, resulting in the above message. - it looks like in this state, iocontrol never really gets the EMC_TOOL_ABORT. What's likely to be wrong: for emc to be able to queue a EMC_TOOL_ABORT when the EMC_TOOL_LOAD still sits in the queue waiting to be acknowledged by iocontrol, queue size must be > 1. However, it seems to me the 'queue' between emc and iocontrol has queue size 1 (that is - just a shared memory buffer with mutex protections). See the line in emc.nml describing the toolCmd buffer: # These are for the IO controller, EMCIO B toolCmd SHMEM localhost 1024 0 0 4 16 1004 TCP=5005 xdr http://www.isd.mel.nist.gov/projects/rcslib/NMLcfg.html states: "...To enable queuing of messages in the buffer, add the word "queue" to the buffer line. The size of the buffer determines how many messages can be simultaneously queued." So to have a real queue, this should probably read: B toolCmd SHMEM localhost 1024 0 0 4 16 1004 TCP=5005 xdr queue <--- enable queueing Probably I should check wether 1024 is large enough to hold at least 2 messages of worst size requirements. Second, iotaskintf.cc:sendCommand() needs to be changed as follows: if the new command is an EMC_TOOL_ABORT and the previous unacked command was an EMC_TOOL_LOAD or EMC_TOOL_PREPARE, do not wait for the old command to be acknowledged but immediately queue the EMC_TOOL_ABORT. Third, iocontrol needs to be changed as follows: It needds to peek() into the queue while there's a pending toolchange checking if an EMC_TOOL_ABORT msg is sitting there. If so, clean up, acknowlegde both messages and revert to idle. thanks to Alex Joni for coaching me so far. -Michael ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developers
