Re: [agi] Fast Parsing in support of the coming Intelligent Internet

Steve Richfield Tue, 19 Mar 2013 11:20:48 -0700

Matt,

On Mon, Mar 18, 2013 at 6:03 PM, Matt Mahoney <[email protected]>wrote:


> On Mon, Mar 18, 2013 at 4:29 PM, Steve Richfield
>


> As to your hash function, I don't see why this should be any faster
> than integer arithmetic.
>
...

> And no, you do not get overflow errors with integer arithmetic. The
> result is just truncated, which for many hash functions is actually
> what you want.
>

After some research into the above, I will more precisely state the
situation with integer hashing and await any clarification from you:

Intel hardware does NOT throw exceptions to integer overflows, but provides
for optional outboard tests to check for these overflows. Other "big iron",
like IBM mainframes, DO throw exceptions to integer overflows, because they
handle money, which is MUCH more valuable than bits.

Some languages like Visual Basic add checks for integer overflows by
default, but this test can be optionally disabled for ALL operations. Some
languages like Java check for NO integer overflows. Some languages like C
allow overflows on unsigned integers, while their operation is undefined
for signed integer overflows. People writing in assembly can do anything
they want, but must then pay for the overhead to get to and from their
assembly coded routines, e.g. via DLL linkage.

In the bad old days of individually compiling routines and linking them
together, it was a trivial matter to specially compile one routine, or
write a particular routine in a different language. Now, you must do
something special like putting a routine into a DLL to get special
treatment for a particular routine.

Integer wraparound is one the primary sources for the myriad updates we all
get from Microsoft to plug security holes. Add some assholes who try to
crash things (I get these all the time being fed into DrEliza.com) and life
can get pretty difficult.

I wouldn't think of attempting to write and maintain AI code in a language
that isn't HIGHLY checked, especially for all matters pertaining to
subscribing. Unfortunately, having integer hashing code ANYWHERE in the
program means that, regardless of the present-day platform chosen, that
subscript computations must go unchecked EVERYWHERE in the program.

Java covers this apparent weakness by allowing integer computations to go
unchecked, but then checking all subscripts prior to use, to make sure that
the code isn't stepping on something besides the array being addressed.
This avoids clobbering memory, but does NOT necessarily guarantee that
there was no wraparound in the computations that arrived at a valid though
possibly incorrect subscript.

So, this leaves the following choices:
1.  A developer chooses an Intel processor and a language that doesn't
check for integer overflows, or chooses to turn integer overflow checking
off in a language like Visual Basic where they can be disabled. In the
process they leave themselves wide open for all of the text on the Internet
hitting the most complex AI code ever written, of wrapping an integers
around SOMEWHERE in the code to cause problems, or
2.  They go ahead and use a slower integer method that survives overflow
checking, with an eye to later replacing it with faster code, once the
program works well enough to safely disable such checking. Of course,
floating point methods would eliminate this step, so why bother?
2.  They use floating point methods that work in the presence of FULL error
checking.
3.  Maybe future compiler writers will provide a method to disable error
checking for specific statements, which would provide the ability to use
integer hashing without sacrificing any error checking.

So, as I see it (and commented in my previous posting) you are technically
correct, but integer hashing isn't worth its non-speed costs, like the
reduced reliability of all of the OTHER code.

I have written thousands of pages of ugly AI code, and I wouldn't dream of
turning off ANY available error checking. I prefer using Visual Basic, only
for its superb error checking that can be selectively disabled once the
program has been fully debugged, as other languages are MUCH more powerful.

Further, there is a special problem in debugging NLP that makes debugging
subtle problems a MAJOR challenge to be avoided at all costs - "heidenbugs"
- where the program is working correctly, e.g. correctly picking up on
something "between the lines", that isn't at all obvious to human readers.
It is EVER so easy to end up chasing a bug that simply isn't there. Of
course, it is hardest of all to find something that isn't there. The
majority of my time during the tail-end of debugging DrEliza.com was spent
chasing heidenbugs, and I suspect that with >100 times as many rules and
the entire Interrnet to analyze, that heidenbugs would completely swamp the
debugging of all other problems, combined. Now, sprinkle in some wraparound
and other such subtle problems, and you would NEVER get the thing fully
debugged. The presence of heidenbugs tend to amplify the cost of finding
the real bugs by an order of magnitude or so.

Heidenbugs are SUCH a complication, because until you figure them out, they
are indistinguishable from crazy things like wraparound. At some point you
notice that, e.g., the last dozen or so bugs you have chased down were all
heidenbugs, so you declare it "working", residual bugs and all. Ugly, but
what other choice is there?

Steve



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] Fast Parsing in support of the coming Intelligent Internet

Reply via email to