Re: [gforth] ORDER bug

Anton Ertl Thu, 17 Nov 2011 08:11:38 -0800

On Mon, Nov 14, 2011 at 01:18:56AM +0100, Bernd Paysan wrote:
> Am Sonntag, 13. November 2011, 18:41:54 schrieb Josh Grams:
> > >Hm, looking at head?, it really looks wrong.  Try replacing it with this:
> > >: head? ( addr -- f )
> > >
> > >\G heuristic check whether addr is a name token; may deliver false
> > >\G positives; addr must be a valid address
> > >
> > >    dup dup aligned <>
> > >    if
> > >   
> > >   drop false exit \ heads are aligned
> > >   
> > >    then
> > >    name>string dup $1F > if
> > >   
> > >   2drop false exit \ realistically the name is short
> > >   
> > >    then
> > >    + cfaligned @ here forthstart within ; \ and the cfa is outside
> > 
> > That does look more straightforward.  But it doesn't eliminate the
> > problem I was having.   I guess I never gave a good minimal test case.
> > This should reproduce it for reliably:
> > 
> >     here 32 cells erase  wordlist .voc
> > 
> > Hmm...a zero-length name should always be an invalid header, so I assume
> > replacing `$1F >` with `$20 1 WITHIN` should be OK?
> 
> Yes, that should do it.  Thanks for the reliable test.


I have now tested where the two versions produce different results
(which is either a false positive or a false negative for one of the
versions).  Program below.  On my system running this as follows
produces the following results:

[~/gforth:76417] gforth xxx.fs -e "test bye" |grep "new accepts as head"|wc -l
781
[~/gforth:76418] gforth xxx.fs -e "test bye" |grep "old accepts as head"|wc -l
4546

So both versions differ by a lot, and given that the old version does
not produce false negatives, the new version obviously produces at
least 781 false positives.  That's a lot, given that there are only
2987 names in the hash table.

Also, the new test recognizes 3634 names in the dictionary; the
difference from the hash table population is only 647.  The difference
from the 781 number above can be explained by 134 false negatives from
the new version, and/or by false negatives from the old version.

Anyway, these numbers are much too high for my taste.  Maybe we should
write a reliable test instead of a heuristic one, by adding some more data.

E.g., if we had a list of all wordlists, we could just check if the
addr matches any of the words in any wordlist.  If that's too slow for
some applications, we can add some caching.

Or we might check all the words in the hash table (but not all
wordlists use the hash table).

What do you think?

- anton

Here's the program:
\ test head? implementations

: new-head? ( addr -- f )
\G heuristic check whether addr is a name token; may deliver false
\G positives; addr must be a valid address
    dup dup aligned <>
    if
        drop false exit \ heads are aligned
    then
    name>string dup $20 $1 within if
        2drop false exit \ realistically the name is short
    then
    + cfaligned @ here forthstart within ; \ and the cfa is outside

: old-head? ( addr -- f )
\G heuristic check whether addr is a name token; may deliver false
\G positives; addr must be a valid address; returns 1 for
\G particularly unsafe positives
    \ we follow the link fields and check for plausibility; two
    \ iterations should catch most false addresses: on the first
    \ iteration, we may get an xt, on the second a code address (or
    \ some code), which is typically not in the dictionary.
    \ we added a third iteration for working with code and ;code words.
    3 0 do
        dup dup aligned <> if \ protect @ against unaligned accesses
            drop false unloop exit
        then
        dup @ dup
        if ( addr addr1 )
            dup rot forthstart within
            if \ addr1 is outside forthstart..addr, not a head
                drop false unloop exit
            then ( addr1 )
        else \ 0 in the link field, no further checks
            2drop 1 unloop exit \ this is very unsure, so return 1
        then
    loop
    \ in dubio pro:
    drop true ;

: my-.name ( nt -- )
    name>string dup ." len=" .
    16 min dump ;

: test
    here forthstart do
        i new-head? i old-head? over <> if
            cr if
                ." new"
            else
                ." old"
            then
            ."  accepts as head: " i hex. i my-.name
        else
            drop
        then
    loop ;

: testnew
    0 here forthstart do
        i new-head? -
    loop ;

Re: [gforth] ORDER bug

Reply via email to