The way I would approach this would be something like:

a = {} # creates empty Any array
sizehint(a, n)
for i = 1:n
    push!(a, i)
end

sizehint, as far as I know, is the more official way to say, "my array is
going to need at least this much space, so don't bother re-allocating along
the way, go ahead and do it right now". Then you're free to push! and
splice! and no extra allocating should have to happen.


On Wed, Jun 4, 2014 at 8:01 AM, Magnus Lie Hetland <[email protected]> wrote:

> I have an array that will typically grow and shrink quite a bit – but I'd
> rather not have stuff move around or perform any costly memory operations
> because of this. My current solution is to simply store the current size as
> a Uint, and then manually check whether I can append to it by simply
> assigning and incrementing, or if I need to push! and increment my size.
> (I could of course resize! by more than one, but I trust push! and its
> innards to deal with dynamic growth in a sensible manner.)
>
> I guess one question is: Do I need to do this? I'd rather avoid the
> duplicated effort, and simply use push! every time – and perhaps use
> resize! to clear it out. Although it's not clear from the array.jl
> <https://github.com/JuliaLang/julia/blob/master/base/array.jl> code (and
> I haven't read the C code for jl_array_del_end), I'd suspect this would
> shrink the underlying available memory, requiring new allocations when
> growing again.
>
> Any input on this? Both on how to use the various APIs to best effect
> here, and on whether there is any point to this?
>
> I've done some very simple benchmarks, and it seems I'm just adding
> overhead. For n = 1000000, let's use the following as the baseline:
>
> a = Array(Any, n)
> resize!(a, 0)
> for i = 1:n
>     push!(a, i)
> end
>
> (I'm using Any because that's what I'll be using in my code; the integers
> are just arbitrary filler, here.)
>
> If I just use assignments, for some reason, it seems things are slower.
> For the following, I seem to be getting about a 30% increase in running
> time:
>
> a = Array(Any, n)
> for i = 1:n
>     a[i] = i
> end
>
> This sort of surprised me – but maybe there is extra bounds checking going
> on, whereas push! knows the size? (That still doesn't really explain it.
> Maybe my benchmarking is wrong?-) Maybe it's multidimensional stuff that
> isn't special-cased enough here (as opposed to in push!)? Or maybe I'm
> just messing up something basic.
>
> Anyway … The next step is more predictable; when I add a length check (to
> decide whether or not to append), things are even slower – about 60% worse
> than the baseline:
>
> a = Array(Any, n)
> for i = 1:n
>     if i < length(a)
>         a[i] = i
>     end
> end
>
> So my question is a bit open-ended or multifaceted, I guess. Any input,
> advice or clarification would be welcome.
>

Reply via email to