On Thu, Nov 17, 2011 at 07:07:23PM +0100, Bernd Paysan wrote: > This here should be better, this checks if the cfa points into the code for > primitives or is a code word. It also give special treatment to aliases > (when > the alias-mask bit is cleared), and checks if the name consists of printable > characters only. > > : new-head? ( addr -- f ) > \G heuristic check whether addr is a name token; may deliver false > \G positives; addr must be a valid address > dup dup aligned <> > if > drop false exit \ heads are aligned > then > dup cell+ @ alias-mask and 0= >r > name>string dup $20 $1 within if > rdrop 2drop false exit \ realistically the name is short > then > cfaligned 2dup bounds ?do \ should be a printable string > i c@ bl < if > 2drop unloop rdrop false exit > then > loop > + r> if \ check for valid aliases > @ dup forthstart here within > over ['] noop ['] lit-execute 1+ within or > over dup aligned = and > 0= if > drop false exit > then > then \ check for cfa - must be code field or primitive > dup @ tuck 2 cells - = swap > docol: ['] lit-execute @ 1+ within or ;
This works pretty well, and only produces three false positives that hybrid-head? did not also produce. I have again applied the idea of checking the link field to this new head?, resulting in: : hybrid-head? ( addr -- f ) \G heuristic check whether addr is a name token; may deliver false \G positives; addr must be a valid address; returns 1 for \G particularly unsafe positives \ we follow the link fields and check for plausibility; two \ iterations should catch most false addresses: on the first \ iteration, we may get an xt, on the second a code address (or \ some code), which is typically not in the dictionary. \ we added a third iteration for working with code and ;code words. 3 0 do dup new-head? 0= if drop false unloop exit endif dup @ dup 0= if 2drop 1 unloop exit else dup rot forthstart within if drop false unloop exit then then loop drop true ; This eliminates the three known false positives. Now we see: [~/gforth:76535] gforth xxx.fs -e "hashpop ? testhybrid . cr bye" 2989 3014 So there are 25 words that are recognized but are not in the hash table. Maybe we have that many words in plain linked lists, or maybe there are still false positives; I did not find any suspicious words among the recognized ones, though. Anyway, I have added this method to our HEAD? check. > This should be a pretty strict test, and at least it won't do any harm to do > a > .name on it, even if it still is a false positive. The problem is that the invalid memory access occurred in name>int (IIRC), which might still trip over a false positive, unless your CFA test excludes this possibility. - anton Here's the test program: : new-head? ( addr -- f ) \G heuristic check whether addr is a name token; may deliver false \G positives; addr must be a valid address dup dup aligned <> if drop false exit \ heads are aligned then dup cell+ @ alias-mask and 0= >r name>string dup $20 $1 within if rdrop 2drop false exit \ realistically the name is short then cfaligned 2dup bounds ?do \ should be a printable string i c@ bl < if 2drop unloop rdrop false exit then loop + r> if \ check for valid aliases @ dup forthstart here within over ['] noop ['] lit-execute 1+ within or over dup aligned = and 0= if drop false exit then then \ check for cfa - must be code field or primitive dup @ tuck 2 cells - = swap docol: ['] lit-execute @ 1+ within or ; : hybrid-head? ( addr -- f ) \G heuristic check whether addr is a name token; may deliver false \G positives; addr must be a valid address; returns 1 for \G particularly unsafe positives \ we follow the link fields and check for plausibility; two \ iterations should catch most false addresses: on the first \ iteration, we may get an xt, on the second a code address (or \ some code), which is typically not in the dictionary. \ we added a third iteration for working with code and ;code words. 3 0 do dup new-head? 0= if drop false unloop exit endif dup @ dup 0= if 2drop 1 unloop exit else dup rot forthstart within if drop false unloop exit then then loop drop true ; : my-.name ( nt -- ) name>string dup ." len=" . 16 min dump ; : test here forthstart do i new-head? 0<> i hybrid-head? 0<> over <> if cr if ." new" else ." hybrid" then ." accepts as head: " i hex. i my-.name else drop then loop ; : testhybrid. 0 here forthstart do i hybrid-head? dup if cr 5 .r space i hex. i .name else drop then loop ; : testhybrid 0 here forthstart do i hybrid-head? 0<> - loop ;