Re: Growable arrays?

David Kastrup Sat, 09 Jun 2012 10:36:16 -0700

Krister Svanlund <krister.svanl...@gmail.com> writes:

> On Sat, Jun 9, 2012 at 2:32 PM, David Kastrup <d...@gnu.org> wrote:
>
>
>     One principal distinguishing feature, like with a Scheme
>     hashtable, is the ability to grow on-demand.
>     
>     Scheme/Guile vectors are fixed size.
>
>     It is a bit of a nuisance that one can grow a hashtable
>     efficiently and on-demand, but not so an array.
>     
>     After all, there already _is_ such a mechanism underlying hash
>     tables so it seems somewhat peculiar not to have it available for
>     vectors as well.
>
> I don't know how much you know about data structures,


I do list the various implementations and details.

> and I must confess I'm not very educated on Guile or Luas
> implementations.

And I do list the details here.  Since I do it in free prose, chances
are that I am not just quoting material I have not understood.

> Based on what you are writing I would assume that the scheme
> hashtables aren't growable in the same way as a vector has to be
> growable.

I don't see anything supporting this assumption in what I wrote.  Nor in
Guile's documentation.


    5.6.12 Hash Tables
    ------------------

    Hash tables are dictionaries which offer similar functionality as
    association lists: They provide a mapping from keys to values.  The
    difference is that association lists need time linear in the size of
    elements when searching for entries, whereas hash tables can normally
    search in constant time.  The drawback is that hash tables require a
    little bit more memory, and that you can not use the normal list
    procedures (*note Lists::) for working with them.

       Guile provides two types of hashtables.  One is an abstract data type
    that can only be manipulated with the functions in this section.  The
    other type is concrete: it uses a normal vector with alists as
    elements.  The advantage of the abstract hash tables is that they will
    be automatically resized when they become too full or too empty.

[...]


    6.4.25.1 Creating hash tables
    .............................

     -- Scheme Procedure: make-hash-table [equal-proc hash-proc #:weak
              weakness start-size]
         Create and answer a new hash table with EQUAL-PROC as the equality
         function and HASH-PROC as the hashing function.

    [...]

         As a legacy of the time when Guile couldn't grow hash tables,
         START-SIZE is an optional integer argument that specifies the
         approximate starting size for the hash table, which will be
         rounded to an algorithmically-sounder number.


> The number of elements in a hashtable isn't limited by it's "size".
> They are often implemented as each position (where the hashtables size
> is the number of positions) being a linked list giving the hashtable
> (in theory) limitless actual size.

However, if the number of hash buckets is not grown along with the
number of entries, hashtable access is O(n) in cost rather than O(1)
since after the initial split into hash buckets, the cost is that of
linear search.  This is the difference in behavior between hashtables in
Guile 1.4 (?) with fixed size, and hashtables in 1.6+ with variable
size.

> Growing a vector/array involves having to allocate new continuous
> memory and copying all the elements there, so for example in C++ (i
> think) the std:vector is increased by half it's current size each time
> meaning that the more expensive the copying gets the more elements you
> can insert into the vector before it has to resize.

Sure: since the growth happens with exponential backoff, the amortized
cost for n entries is O(n).

> I would assume it wouldn't be that difficult to implement a pretty
> efficient growable vector for scheme.

Since that already is what is used internally in hashtables it can't be
difficult...  The advantage of growing a hashtable is that you don't
have waste: if you double the size of a hashtable, it means that you
split each bucket in two, and potentially any bucket after the split can
contain new data.  In contrast, after a similar vector resize, half of
the buckets are _guaranteed_ not to contain data.  You can reduce the
waste by using less than exponential backoff, but then the amortized
cost is no longer O(n).

Anyway: your answer was based on the assumption that I did not do my
homework before asking, and that two people not reading documentation
might guess better than one person not reading documentation.

I hope I have now provided adequate coverage concerning this hypothesis
so that it should be possible to focus on the smaller set of remaining
ones.

-- 
David Kastrup

Re: Growable arrays?

Reply via email to