Hi Bernd,
I have fiddled with machine translation a bit more and had fun reading
some German idioms (like killing two flies with one hit) go into a
complete nonsense. What I understood is that by default any bigforth
vocabulary is a hash-table. That is great in terms of having the hash
data structure available at hand and I'd like to use it. To make it
completely useful I need a way to remove a single word from vocabulary,
unlike *forget* that cuts off the tail. Is there already such a word?
Follows some more of my translations. I have a comment about your
testing remark in the end of the section. You say the words were
normally distributed - should be uniformly, I guess.
The text:
\section{Hashed Vocabulary}
% The first paragraph is missing - online translator was unhelpful.
Besides accessing the disk the most time-consuming part is searching
for a word in vocabulary. FORTH represents its vocabularies as linear
lists. Thus the search speed linearly depends on the length of the
vocabulary, though sometimes the word is not found (which can happen
in a number of cases). With the large extent of vocabularies, which
FORTH does indeed have, the search takes noticeable time. Often, on
the order of 1000 strings must be compared.
This can be sufficiently accelerated by changing the structure of
vocabularies. A tree \zB. reduces the search time to
logarithmic. However the management of a suitable tree is very ( an
AVL--tree \zB.) complex. However, a linear speed increase with a
substantially simpler administration is offered by Hash--table.
A key is assigned to each word. This key should be distributed as
evenly as possible, because on the basis of the key the words are
linked to different places of an array. When a word is looked for, its
key is computed at first. Then one needs to only scan the list of all
words with the same key. Since in bigFORTH the table has 128 entries,
on average that is only $1/128$ words of the vocabulary.
bigFORTH uses a simple method to compute the key: all letters of the
word and the COUNT byte are added together; then the remainder of the
division of this value by table length is kept as the result. The
algorithm is simple, and thus efficiently distributes the words (in
order to examine the performance I wrote a statistics--tool. Test
results: the words are practically normally distributed.)
The Hash--tool works normally under the hod, so its words are rather
uninteresting. The module is called HASH and exports only its name and
all words in its vocabulary.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]