Krister Svanlund <krister.svanl...@gmail.com> writes: > On Sat, Jun 9, 2012 at 2:32 PM, David Kastrup <d...@gnu.org> wrote: > > > One principal distinguishing feature, like with a Scheme > hashtable, is the ability to grow on-demand. > > Scheme/Guile vectors are fixed size. > > It is a bit of a nuisance that one can grow a hashtable > efficiently and on-demand, but not so an array. > > After all, there already _is_ such a mechanism underlying hash > tables so it seems somewhat peculiar not to have it available for > vectors as well. > > I don't know how much you know about data structures,
I do list the various implementations and details. > and I must confess I'm not very educated on Guile or Luas > implementations. And I do list the details here. Since I do it in free prose, chances are that I am not just quoting material I have not understood. > Based on what you are writing I would assume that the scheme > hashtables aren't growable in the same way as a vector has to be > growable. I don't see anything supporting this assumption in what I wrote. Nor in Guile's documentation. 5.6.12 Hash Tables ------------------ Hash tables are dictionaries which offer similar functionality as association lists: They provide a mapping from keys to values. The difference is that association lists need time linear in the size of elements when searching for entries, whereas hash tables can normally search in constant time. The drawback is that hash tables require a little bit more memory, and that you can not use the normal list procedures (*note Lists::) for working with them. Guile provides two types of hashtables. One is an abstract data type that can only be manipulated with the functions in this section. The other type is concrete: it uses a normal vector with alists as elements. The advantage of the abstract hash tables is that they will be automatically resized when they become too full or too empty. [...] 6.4.25.1 Creating hash tables ............................. -- Scheme Procedure: make-hash-table [equal-proc hash-proc #:weak weakness start-size] Create and answer a new hash table with EQUAL-PROC as the equality function and HASH-PROC as the hashing function. [...] As a legacy of the time when Guile couldn't grow hash tables, START-SIZE is an optional integer argument that specifies the approximate starting size for the hash table, which will be rounded to an algorithmically-sounder number. > The number of elements in a hashtable isn't limited by it's "size". > They are often implemented as each position (where the hashtables size > is the number of positions) being a linked list giving the hashtable > (in theory) limitless actual size. However, if the number of hash buckets is not grown along with the number of entries, hashtable access is O(n) in cost rather than O(1) since after the initial split into hash buckets, the cost is that of linear search. This is the difference in behavior between hashtables in Guile 1.4 (?) with fixed size, and hashtables in 1.6+ with variable size. > Growing a vector/array involves having to allocate new continuous > memory and copying all the elements there, so for example in C++ (i > think) the std:vector is increased by half it's current size each time > meaning that the more expensive the copying gets the more elements you > can insert into the vector before it has to resize. Sure: since the growth happens with exponential backoff, the amortized cost for n entries is O(n). > I would assume it wouldn't be that difficult to implement a pretty > efficient growable vector for scheme. Since that already is what is used internally in hashtables it can't be difficult... The advantage of growing a hashtable is that you don't have waste: if you double the size of a hashtable, it means that you split each bucket in two, and potentially any bucket after the split can contain new data. In contrast, after a similar vector resize, half of the buckets are _guaranteed_ not to contain data. You can reduce the waste by using less than exponential backoff, but then the amortized cost is no longer O(n). Anyway: your answer was based on the assumption that I did not do my homework before asking, and that two people not reading documentation might guess better than one person not reading documentation. I hope I have now provided adequate coverage concerning this hypothesis so that it should be possible to focus on the smaller set of remaining ones. -- David Kastrup