On 2009-09-30 13:31 PM, Christopher Barker wrote:
> Maybe this should be on the numpy list, but for now...
>
> Robert Kern wrote:
>> array.array() solves these problems if your type is supported. It follows the
>> same preallocation strategy as lists. One may create a numpy array from it
>> quickly using np.frombuffer().
>
> right -- I tend to forget that, but I really like the idea of supporting
> all numpy dtypes -- in particular, this could be nice for genfromtxt()
>
>>> What I have in mind is very simple. It would be:
>>> - Only 1-d
>>> - Support append() and extend() methods
>>> - Support any valid numpy dtype
>>> - which could even get you pseudo n-d arrays...
>>> - maybe it would act like an array in other ways, I'm not so sure..
>>>
>>>
>>> It could be written in pure python, using np.resize()(or concatenate,
>>> or,...), and certainly written efficiently in Cython or even (no!) C.
>>
>> An Appender object that maintains an array that .resize()s (the ndarray
>> method,
>> not the function! They are quite different.)
>
> I know there are different, but I'm not exactly sure of the details.
np.resize() is really just a convenience method for creating a new array object
with the requested size and with the data copied over. ndarray.resize()
actually
realloc()s the memory underneath the array object and changes the data pointer
to point to the newly allocated memory. The old memory is freed and becomes
unavailable.
> However, I've noticed that ndarray.resize() often fails for me with ipython:
>
> In [33]: a = np.array((1,2,3,4))
>
> In [34]: a
> Out[34]: array([1, 2, 3, 4])
>
> In [35]: a.resize((10,))
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
>
> /Users/cbarker/Junk/test.py in<module>()
> ----> 1
> 2
> 3
> 4
> 5
>
> ValueError: cannot resize an array that has been referenced or is
> referencing
> another array in this way. Use the resize function
>
>
> I think that's because ipython keeps a reference to whatever the last
> returned value is (from the print statement). In any case, it's lead me
> to believe that it can be fragile!
That's why it needs to be encapsulated in an Appender object.
>> according to the list/array.array
>> preallocation strategy would be quite useful.
>
> That's what I had in mind, though I wonder:
> - IIUC correctly, a common apporach is to allocate twice as much
> space each time a re-allocation is required. The means you won't have to
> re-allocate that much, and you'll never have allocated more than twice
> as much as you need. However, when you are bumping inot memory limits,
> twice as much could be too much.
Take a look at Modules/arraymodule.c for the details. It is not 2x.
/* This over-allocates proportional to the array size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 34, 46, 56, 67, 79, ...
* Note, the pattern starts out the same as for lists but then
* grows at a smaller rate so that larger arrays only overallocate
* by about 1/16th -- this is done because arrays are presumed to be
more
* memory critical.
*/
_new_size = (newsize >> 4) + (self->ob_size < 8 ? 3 : 7) + newsize;
>> I do recommend that you keep the
>> array "private" until you are done with it. This helps prevent views being
>> made
>> of the array so you can keep using the .resize() method.
>
> that may take care of the above issue, then. This points to a "has a"
> relationship, rather than a subclassing of ndarray, which would probably
> be necessary anyway.
Yes, absolutely.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev