Re: A couple of questions about arrays and slices

Jonathan M Davis via Digitalmars-d-learn Sat, 24 Jun 2023 05:07:31 -0700

On Saturday, June 24, 2023 1:43:53 AM MDT Cecil Ward via Digitalmars-d-learn 
wrote:
> On Saturday, 24 June 2023 at 07:36:26 UTC, Cecil Ward wrote:
> > Jonathan, is it possible that I wanted one thing and got
> > another? My description in the earlier post was of the _aim_ of
> > the program. What I ended up with might be something else? I
> > wanted an array of uints whose values are the results/outputs
> > of the mapping function. Since it is keyed by strings I assumed
> > that the runtime generates some kind of hash for fast lookup
> > when I ask it to retrieve an entry by the string (key)
> > associated with it. I assumed that in some sense the hashing
> > was sort of separate with some degree of independence from the
> > underlying array, if that makes sense. The lookup is just
> > assumed to be fast but how it is done we don’t really care. I
> > just wanted to expand the array as I did successfully elsewhere
> > with reserve, as I built this structure by successive additions
> > of data. I have a number of strings and the map is meant to
> > output the ordinal number in which I first saw them,
> > zero-based. Then I want to come back and randomly look up one
> > ordinal given a string preferably with a very fast lookup. The
> > number of entries can not practically be more than 30, and even
> > that would be highly unusual, maybe ten is the practical limit
> > in my particular case, so it’s hardly MySQL.
>
> I just realised something, your point about altering the table
> and having to rehash, is well taken. I hadn’t considered that.
> The reason for my foolishness in failing to realise that I’m
> asking the impractical is my pattern of usage. I add all the
> entries into the mapping table and have no interest in any
> lookups until it is fully built. Then a second function starts to
> do lookups while the data remains unchanging and that usage
> pattern can be guaranteed. I could even idup it if that would
> help, as copying < 32 uints wouldn’t take forever. A typical
> value would be a mere 5 or less. I only picked 32 to be
> completely safely ott.


Well, if the key were a struct or a class, the hashing function would be
opHash. For built-in types, the runtime has hashing functions that it uses.
Either way, with AAs, you really don't worry about managing the memory,
because it's completely outside of your control. You just put the elements
in there using their associated keys, and if you want to try to speed it up
after you've populated it, you use rehash so that the runtime can try to
move the elements around within the container so that lookup speeds will be
closer to optimal.

As such, for the most part, when dealing with AAs and worrying about
efficiency, the question really becomes whether AAs are the correct solution
rather than much of anything having to do with how you manage their memory.

With so few elements, it's also possible that using
std.algorithm.searching.find would be faster - e.g. having a dynamic array
of strings where the matching int is at the same index in a dynamic array of
ints - or you could use std.typecons.Tuple!(string, int)[] with something
like arr.find!(a => a[0] == key)() to find the tuple with the int you want.

Simply comparing a small number of strings like that might be faster than
what goes on with hashing the string and then finding the corresponding
element within the AA - or it might not be. You'd have to test that to know.
The AA would definitely be faster with a large number of elements, but with
a small number of elements, the algorithmic complexity doesn't really
matter, and the extra overhad with the AA lookups could actually mean that
the search through the dynamic array is faster even though it's O(n). But
you can only know which is faster by testing it out with the actual data
that you're dealing with.

Regardless, you need to remember that associative arrays are not arrays in
the C sense. Rather, they're hash tables, so they function very differently
from dynamic arrays, and the rehash function is the closest that you're
going to get to affecting how the elements are laid out internally or how
much memory the AA is using.

- Jonathan M Davis

Re: A couple of questions about arrays and slices

Reply via email to