> "Find" commands accept wild cards and evaluate using collation algorithms
(case-insensitive comparison plus some
 > other locale specific rules) is it really fair to compare the two
against object keys?

I'm not sure what "fair" means here, but it's definitely not a
apples-to-apples comparison. Find in array and Find in sorted array are
variants on the same thing, so they're easy to compare. Object keys? No
idea. As far as I can tell, 4D refused to offer any information about how
the work that the company will stand behind publicly. Thats okay, I'm used
to black box testing and I like it...but it's time-consuming.

The point of these comparisons isn't to figure out if one approach is
"better" than another so much as how they work *in the real world.* Put
another way, the goal is to come up with some rules of thumb about what to
use when. Binary search kicks ass, and I know why. Object key lookups kick
ass, and I don't know why. My take-away is to

* Use objects when when they're easier or more appropriate for the problem
at hand.

* Use sorted arrays when when they're easier or more appropriate for the
problem at hand.

* Don't shift from arrays to objects based on a notion that they're
"faster."

* Consider objects instead of arrays if you don't have or can't be sure of
a sorted array order because object key lookups are way faster than
sequential array traversal (Find in array.)

* Don't worry about speed at all unless you've got a solid reason to.

Thinking best on my tests, a few points for anyone that wants to tweak them:

-- If you want sequential searches to look better, just search for the
first items. Search time should be directly related to the position of the
target in the array. I avoided this trap on purpose.

-- I used very small text values for lookups and keys! Long strings might
behave differently, I don't know. I would actually find that an interesting
result, if anyone feels like checking.

-- The object keys are inserted into the test object in sorted order. This
should not make any difference if there's a hash underneath, but we don't
actually know that. Although it does see likely. From the few results I've
gotten, I'd wildly guess that there's:

-- An excellent hash function where "excellent" means "low collision, high
dispersal and fast."

-- A secondary structure off the hash bins that is itself smart. So, not a
linked list (The CS 101 approach), but a second good hash or a tree of some
kind. Or something else.

-- A pretty large range of hash slots to reduce secondary lookup times.

-- Probably some smart scheme for changing the hash table size dynamically
under stress. That's an expensive maneuver (or normally is, I can think of
ways to make it not too expensive.)

Just speculating, I'm probably wrong in every detail here. Doesn't matter.
It's a black box.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Reply via email to