Matt, On Mon, Mar 18, 2013 at 6:03 PM, Matt Mahoney <[email protected]>wrote:
> On Mon, Mar 18, 2013 at 4:29 PM, Steve Richfield > > As to your hash function, I don't see why this should be any faster > than integer arithmetic. > ... > And no, you do not get overflow errors with integer arithmetic. The > result is just truncated, which for many hash functions is actually > what you want. > After some research into the above, I will more precisely state the situation with integer hashing and await any clarification from you: Intel hardware does NOT throw exceptions to integer overflows, but provides for optional outboard tests to check for these overflows. Other "big iron", like IBM mainframes, DO throw exceptions to integer overflows, because they handle money, which is MUCH more valuable than bits. Some languages like Visual Basic add checks for integer overflows by default, but this test can be optionally disabled for ALL operations. Some languages like Java check for NO integer overflows. Some languages like C allow overflows on unsigned integers, while their operation is undefined for signed integer overflows. People writing in assembly can do anything they want, but must then pay for the overhead to get to and from their assembly coded routines, e.g. via DLL linkage. In the bad old days of individually compiling routines and linking them together, it was a trivial matter to specially compile one routine, or write a particular routine in a different language. Now, you must do something special like putting a routine into a DLL to get special treatment for a particular routine. Integer wraparound is one the primary sources for the myriad updates we all get from Microsoft to plug security holes. Add some assholes who try to crash things (I get these all the time being fed into DrEliza.com) and life can get pretty difficult. I wouldn't think of attempting to write and maintain AI code in a language that isn't HIGHLY checked, especially for all matters pertaining to subscribing. Unfortunately, having integer hashing code ANYWHERE in the program means that, regardless of the present-day platform chosen, that subscript computations must go unchecked EVERYWHERE in the program. Java covers this apparent weakness by allowing integer computations to go unchecked, but then checking all subscripts prior to use, to make sure that the code isn't stepping on something besides the array being addressed. This avoids clobbering memory, but does NOT necessarily guarantee that there was no wraparound in the computations that arrived at a valid though possibly incorrect subscript. So, this leaves the following choices: 1. A developer chooses an Intel processor and a language that doesn't check for integer overflows, or chooses to turn integer overflow checking off in a language like Visual Basic where they can be disabled. In the process they leave themselves wide open for all of the text on the Internet hitting the most complex AI code ever written, of wrapping an integers around SOMEWHERE in the code to cause problems, or 2. They go ahead and use a slower integer method that survives overflow checking, with an eye to later replacing it with faster code, once the program works well enough to safely disable such checking. Of course, floating point methods would eliminate this step, so why bother? 2. They use floating point methods that work in the presence of FULL error checking. 3. Maybe future compiler writers will provide a method to disable error checking for specific statements, which would provide the ability to use integer hashing without sacrificing any error checking. So, as I see it (and commented in my previous posting) you are technically correct, but integer hashing isn't worth its non-speed costs, like the reduced reliability of all of the OTHER code. I have written thousands of pages of ugly AI code, and I wouldn't dream of turning off ANY available error checking. I prefer using Visual Basic, only for its superb error checking that can be selectively disabled once the program has been fully debugged, as other languages are MUCH more powerful. Further, there is a special problem in debugging NLP that makes debugging subtle problems a MAJOR challenge to be avoided at all costs - "heidenbugs" - where the program is working correctly, e.g. correctly picking up on something "between the lines", that isn't at all obvious to human readers. It is EVER so easy to end up chasing a bug that simply isn't there. Of course, it is hardest of all to find something that isn't there. The majority of my time during the tail-end of debugging DrEliza.com was spent chasing heidenbugs, and I suspect that with >100 times as many rules and the entire Interrnet to analyze, that heidenbugs would completely swamp the debugging of all other problems, combined. Now, sprinkle in some wraparound and other such subtle problems, and you would NEVER get the thing fully debugged. The presence of heidenbugs tend to amplify the cost of finding the real bugs by an order of magnitude or so. Heidenbugs are SUCH a complication, because until you figure them out, they are indistinguishable from crazy things like wraparound. At some point you notice that, e.g., the last dozen or so bugs you have chased down were all heidenbugs, so you declare it "working", residual bugs and all. Ugly, but what other choice is there? Steve ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
