[Emc-developers] serious bug in v2.5_branch aborted o-word procedure handling

Michael Haberler Thu, 15 Sep 2011 06:07:29 -0700

v2.5_branch cannot properly recover from aborted oword procedures in external 
ngc files.


Symptom: aborted O-word subroutine leaves wrong file as current file, next run 
command executes the wrong file.

---> please scroll to bottom to the decision item.

Reproduce by:

- have a ngc files like so:

--- main.ngc ---
o<foo> call
m2
-----

---- foo.ngc:
o<foo> sub
#123 = [1/0]
o<foo> endsub
m2
----


- Load main.ngc in Axis. This immediately runs the program for preview.
- Axis Interpreter aborts during preview run with Division by zero message
- clear the error + click the Run icon
- task runs program in task interpreter
- task interpreter hits Division by zero, aborts
- notice the filename in the Axis top level bar - changed to foo.ngc (!)
- Now hit 'run' again
- Axis will tell task to run 'foo.ngc' which doesnt do anything since its just 
the sub definition
- Axis top level bar filename remains at foo.ngc .
- To get back to the starting state, one needs to open main.ngc again 

the sequence of events is quite involved:
- Axis loads main.ngc and executes it immediately for preview, hits Division by 
zero
- Axis interpreter call_level remains at 1, _setup.filename now 'foo.ngc' BUT 
taskfile (via emcstat()) still 'main.ngc'
- Axis toplevel bar shows 'main.ngc' because this reports the filename obtained 
via emcstat

- Task interpreter has at this point NOT executed main.ngc, therefore task's 
perception of filename is 'main.ngc' as reflected in the axis toplevel bar
- now hit 'R' to run the program
- Task interpreter executes main.ngc, calls foo.ngc
- foo.ngc executes and foo.ngc is copied into task.file, which goes into emcstat
- interpreter returns INTERP_ERROR while foo.ngc is open, and does NOT reset 
the filename.
  (_setup.call_level stays at 1 (!) and _setup.filename is still 'foo.ngc')
- Axis picks up foo.ngc through emcstat() and displays it in the toplevel bar
- On the subsequent Axis Run command, Task is told to run 'foo.ngc' which of 
course doesnt do anything 

 
The fix is obvious IMO: every time the interpreter fails, the call stack needs 
to be unwound and the original filename restored; most of that code is in 
Interp::reset(). Unfortunately the current approach to when reset() is actually 
called is bit inconsistent (a.k.a 'grown over time'):

1. Failure of read()
1.1 MDI,auto: errors are reset on the next task run which causes an 
open()/close()/reset() sequence, not the interpreter itself (!)

2: Failure of execute()
2.1 MDI: error reset by directly calling reset: line 259 
http://git.linuxcnc.org/gitweb?p=emc2.git;a=blob;f=src/emc/rs274ngc/rs274ngc_pre.cc;h=8f186c513a22c8f0e41be8f84005c06dfae2a454;hb=df93a83e2b64c46dd540b29a84657247ed200d16#l1047
2.2 auto: error reset by task on next run as 2.2

note that states 1.1 and 2.1 are interpreter zombie states - it has a botched 
callstack and wrong filename, and it cannot run, read() or execute() without a 
previous reset(). That is surely a source for subtle different behaviour of MDI 
and auto modes, which is a constant nuisance. Plus, the most obscure 
USE_LAZY_CLOSE feature also gets in the way as it may prevent reset() from 
being called.

Hence there are two approaches to the problem:

(1) Interpreter internal error reset: every time read(), or execute() fail, the 
call stack is unwound.
(2) Error reset by using program: every time read() or execute() return 
INTERP_ERROR, the *using program* will call the unwind method.

(1) looks more robust to me - basically the invariant is 'interpreter erorred - 
call stack is unwound'
(2) would leave the interpreter in the aborted state, enabling restart with 
some clever code. It also means touching *all the calling code*, which is in 
gcodemodule.cc and in sai/driver.c, and maybe some I didnt think of yet.

I'll proceed and implement a fix along (1). If you have good reasons against 
it, tell me now.

I am also interested in hearing a really good justification for the 
USE_LAZY_CLOSE 'feature'. Otherwise I think that should be removed. 
'performance' by avoiding a single close/open doesnt count - we're not running 
Z80's anymore.


-Michael




------------------------------------------------------------------------------
Doing More with Less: The Next Generation Virtual Desktop 
What are the key obstacles that have prevented many mid-market businesses
from deploying virtual desktops?   How do next-generation virtual desktops
provide companies an easier-to-deploy, easier-to-manage and more affordable
virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers

[Emc-developers] serious bug in v2.5_branch aborted o-word procedure handling

Reply via email to