On Mon, Nov 14, 2011 at 01:18:56AM +0100, Bernd Paysan wrote: > Am Sonntag, 13. November 2011, 18:41:54 schrieb Josh Grams: > > >Hm, looking at head?, it really looks wrong. Try replacing it with this: > > >: head? ( addr -- f ) > > > > > >\G heuristic check whether addr is a name token; may deliver false > > >\G positives; addr must be a valid address > > > > > > dup dup aligned <> > > > if > > > > > > drop false exit \ heads are aligned > > > > > > then > > > name>string dup $1F > if > > > > > > 2drop false exit \ realistically the name is short > > > > > > then > > > + cfaligned @ here forthstart within ; \ and the cfa is outside > > > > That does look more straightforward. But it doesn't eliminate the > > problem I was having. I guess I never gave a good minimal test case. > > This should reproduce it for reliably: > > > > here 32 cells erase wordlist .voc > > > > Hmm...a zero-length name should always be an invalid header, so I assume > > replacing `$1F >` with `$20 1 WITHIN` should be OK? > > Yes, that should do it. Thanks for the reliable test.
I have now tested where the two versions produce different results (which is either a false positive or a false negative for one of the versions). Program below. On my system running this as follows produces the following results: [~/gforth:76417] gforth xxx.fs -e "test bye" |grep "new accepts as head"|wc -l 781 [~/gforth:76418] gforth xxx.fs -e "test bye" |grep "old accepts as head"|wc -l 4546 So both versions differ by a lot, and given that the old version does not produce false negatives, the new version obviously produces at least 781 false positives. That's a lot, given that there are only 2987 names in the hash table. Also, the new test recognizes 3634 names in the dictionary; the difference from the hash table population is only 647. The difference from the 781 number above can be explained by 134 false negatives from the new version, and/or by false negatives from the old version. Anyway, these numbers are much too high for my taste. Maybe we should write a reliable test instead of a heuristic one, by adding some more data. E.g., if we had a list of all wordlists, we could just check if the addr matches any of the words in any wordlist. If that's too slow for some applications, we can add some caching. Or we might check all the words in the hash table (but not all wordlists use the hash table). What do you think? - anton Here's the program: \ test head? implementations : new-head? ( addr -- f ) \G heuristic check whether addr is a name token; may deliver false \G positives; addr must be a valid address dup dup aligned <> if drop false exit \ heads are aligned then name>string dup $20 $1 within if 2drop false exit \ realistically the name is short then + cfaligned @ here forthstart within ; \ and the cfa is outside : old-head? ( addr -- f ) \G heuristic check whether addr is a name token; may deliver false \G positives; addr must be a valid address; returns 1 for \G particularly unsafe positives \ we follow the link fields and check for plausibility; two \ iterations should catch most false addresses: on the first \ iteration, we may get an xt, on the second a code address (or \ some code), which is typically not in the dictionary. \ we added a third iteration for working with code and ;code words. 3 0 do dup dup aligned <> if \ protect @ against unaligned accesses drop false unloop exit then dup @ dup if ( addr addr1 ) dup rot forthstart within if \ addr1 is outside forthstart..addr, not a head drop false unloop exit then ( addr1 ) else \ 0 in the link field, no further checks 2drop 1 unloop exit \ this is very unsure, so return 1 then loop \ in dubio pro: drop true ; : my-.name ( nt -- ) name>string dup ." len=" . 16 min dump ; : test here forthstart do i new-head? i old-head? over <> if cr if ." new" else ." old" then ." accepts as head: " i hex. i my-.name else drop then loop ; : testnew 0 here forthstart do i new-head? - loop ;