Actually, no. I don't actually do that. I only resize the array one every
1000 lines (configurable). Also, the time is not spent there.

As I mentioned, I ran it under Callgrind, and the time spent allocating
arrays is actually minimal. What does take time is the 2.2 *billion* cell
allocations and the 50 *million* calls to Value::clone(). Most of these
calls clone a value that that is immediately discarded afterwards.

The solution is to avoid cloning of values that are not stored (that's the
core of the "temp" idea). Right now the temp system is only used in some
very specific cases, but once that can be used for Value::clone() is when
we'll see the big performance boosts.

Regards,
Elias


On 25 April 2014 13:53, David B. Lamkins <da...@lamkins.net> wrote:

> Given a quick read, I get the impression that you're still incrementally
> extending the length of the result. This is, by definition, an O(n^2)
> operation. There's a lot of catenation in your code; that'll almost
> certainly involve copying.
>
> Try this instead:
>
> 1. Get the size of the file to be read.
> 2. Preallocate a vector large enough to hold the entire file.
> 3. Read the file (I'm assuming that the file_io won't let you read it
> all at one go) by chunks and use indexed assignment to copy each chunk
> into its position in the preallocated vector.
>
>
>
> On Fri, 2014-04-25 at 00:21 +0800, Elias Mårtenson wrote:
> > In writing a function that uses lib_file_io to load the content of an
> > entire file into an array of strings, I came across a bad performance
> > problem that I am having trouble narrowing down.
> >
> >
> > Here is my current version of the
> > function:
> https://github.com/lokedhs/apl-tools/blob/e3e81816f3ccb4d8c56acc8e4012d53f05de96d6/io.apl#L8
> >
> >
> > The first version did not do blocked reads and resized the array after
> > each row was read. That was terribly slow, so I preallocate a block of
> > 1000 elements, and resize every 1000 lines, giving the version you can
> > see linked above.
> >
> >
> > I was testing with a text file containing almost 14000 rows, and on my
> > laptop it takes many minutes to load the file. One would expect that
> > the time taken to load such a small file should not take any
> > noticeable time at all.
> >
> >
> > One interesting aspect of this is that it takes longer and longer to
> > load each row as the loading proceeds. I have no explanation as to why
> > that is the case. It's not the resizing that takes time, I was
> > measuring the time taken to load a block of rows excluding the array
> > resize.
> >
> >
> > Any ideas?
> >
> >
> > Regards,
> > Elias
>
>

Reply via email to