Re: [delphi-en] Re: Limitations of Access?

Walter Prins Sat, 18 Feb 2006 12:05:16 -0800

Thomas,

Thomas Hruska wrote:


>An associative hash (a.k.a. associative array) implements either a hash 
>table or a B-tree - usually the latter in C++ because stl hash_map 
>specifically implements a hash table.  
>  
>
This is however not quite what you said in your original post.  In your 
original post you said:

"An associative hash is a data structure that implements a B-tree and 
thus has O(log n) insert and search times." 

The 2 terms "associative array" and "associative hash" are not entirely 
synonymous or interchangeable: "associative hash" says something about 
the implementation detail and implies using hashing, whereas 
"associative array" defines an abstract datatype mapping a set of keys 
to a set of values.   Thus, when you said "associative hash" (and did no 
use the more general ADT term of "associate array"), and then also 
stated that it was (by implication) **always** implemented as a B-Tree 
with O(log n) search time, it appeared like a mistake to me (or at least 
an incomplete answer).  Hence my response.

Thus, I felt that if you meant to use the the general term of an 
"associative map" in your answer then you should've at least mentioned 
both possible implementations in your answer (using B-Trees or Hash 
tables) with both time complexities to avoid giving others a wrong 
impression.  That's all I was trying to point out, I apologise if this 
correction came over as overly critical or critical of you personally. 

>http://en.wikipedia.org/wiki/Associative_array
>
>So I'm using correct terminology.
>
>  
>
Just for completeness,  I'll also quote the first 2 paragraphs of the 
entry on "Hash table" in wikipedia (__emphasis__ mine):

"In computer science, a hash table, or a __hash map__, is a data 
structure that associates keys with values.  The primary operation it 
supports efficiently is a lookup: given a key (e.g. a person's name), 
find the corresponding value (e.g. that person's telephone number). It 
works by transforming the key using a hash function into a hash, a 
number that the hash table uses to locate the desired value.
A small phone book as a hash table.
Enlarge
A small phone book as a hash table.

Hash tables are often used to implement __associative arrays__, sets and 
caches. Like arrays, hash tables __provide constant-time O(1) lookup on 
average__, regardless of the number of items in the table. However, the 
rare __worst-case__ lookup time can be as bad as __O(n)__. Compared to 
__other__ associative array data structures, hash tables are most useful 
when large numbers of records of data are to be stored.

Hash tables store data in pseudo-random locations, so accessing the data 
in a sorted manner is a very time consuming operation. __Other data 
structures__ such as self-balancing __binary search trees__ generally 
operate more slowly (since their lookup time is __O(log n)__) and are 
rather more complex to implement than hash tables but maintain a sorted 
data structure at all times."

So, to be clear:  Hashing has a __general__ time complexity of O(1), 
which degerates in the worst case to O(n).  Other structures like 
B-trees have __worst case__ complexity of O(log n), but has other 
advantages like being traversable in sorted order.  This was all I was 
trying to ensure everyone else was aware of.  This did not come accross 
from your original answer.

>Your suggestion to load the entire database into memory is not a good 
>one.  According to the OP, it is roughly 100MB...when you take into 
>consideration data structures and data alignment, loading the whole 
>thing would probably consume at least 300MB RAM.
>
>  
>
The original poster originally said 30Mb, and only later had a proper 
look and reported back that it was in fact a 100Mb.  The original 
thought I had ocurred with the 30Mb size in mind.  And in any case, I 
never actually __strongly recommended__ that he go down this road at 
all, I merely mentioned it as "a thought that had gone through my mind" 
given the orignal 30Mb size of the database, in order to share it with 
the rest of the group.  Further, I also said

"...with the amount of growth per week, such a solution may not be 
scalable so is __possibly not such a good idea from that point of view 
either...__"  

So, I already pointed out in my own post that it might not be such a 
good idea, and hadn't even considered other possible implementation 
details like data alignment.  I'm sorry if I left the impression that I 
was actually recommending he actually goes down this road. 

>You should get yourself a copy of my book on Safe C++ Design Principles. 
>  Your assumptions that accessing a hash table is always O(1) are false. 
>  When a hash collision occurs, most hash tables switch to a O(n) search 
>algorithm (linked list).  Also, as stated in the book, O(1) algorithms 
>usually come with a side-effect of consuming other resources...CPU, 
>memory, etc.  In this particular case, calculating a good hash key 
>typically takes many CPU cycles.
>  
>
You seem to have misread what I said.  I said:

"B-Trees have O(log n) lookup time as you say, but hashing/hash 
tables/hash maps don't use B-Trees and in fact has O(1) lookup time.  It 
takes just as long __(in general case)__ to look up an item from 100 as 
it does from 100 000 000."

Note the "general case" comment.  __Of course__ the theorectical O(1) 
time complexity of hash-tables deteriorates into worse than the general 
case due to practical considerations (collisions and whatever else) in 
some situations.  I know that just as well as you.  Indeed, the 
wikipedia page I mentioned also clearly states that, so to be clear,  I 
never said or tried to say that accessing hash-tables is __always__ 
pedantically, strictly, O(1).  I merely stated that hash tables 
generally have a time complexity of O(1) (which I'm sure we can agree 
on) and that "__in the general case__ it taks just as long to look up an 
item from 100 as from 100 000 000" which is approximately true.  

In closing, I'd like to apologise if my original reply was perhaps a bit 
abrupt and caused you offense. I was simply trying to add on to/correct 
what appeared to me to be incorrect (or incomplete) information.

Sincerely,

Walter Prins


>  
>


-----------------------------------------------------
Home page: http://groups.yahoo.com/group/delphi-en/
To unsubscribe: [EMAIL PROTECTED] 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/delphi-en/

<*> To unsubscribe from this group, send an email to:
    [EMAIL PROTECTED]

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/

Re: [delphi-en] Re: Limitations of Access?

Reply via email to