On Fri, Jun 16, 2006 at 10:14:07PM +0200, Otto Moerbeek wrote: > On Fri, 16 Jun 2006, Jacob Yocom-Piatt wrote: > > > i've got some C code that is reading from a 800 MB CSV file and allocates > > memory > > for an array to store the data in. the method used is to read the CSV file > > line-by-line and realloc additional space with each line read. having timed > > this > > and found the realloc speed to be low when the array is large, i am aiming > > to > > make this faster but am not sure about the best way to proceed. > > > > the current code uses realloc in the manner suggested by the manpage: > > > > newsize = size + 1; > > time(&t1); // start timing realloc > > if ((newap = (int *)realloc(ap, newsize*sizeof(int))) == NULL) { > > free(ap); > > ap = NULL; > > size = 0; > > return (NULL); > > } > > time(&t2); // stop timing realloc; start timing fscanf > > > > as the size of ap grows, so does the time it takes to realloc the space. > > > > an alternative to this procedure would be to scan through the CSV file to > > determine how many array entries i would need, realloc it all at once, then > > go > > back through the CSV file again to read the data into the array. i'm not > > confident this is the only way to do this and would appreciate any > > suggestions > > for speeding up this procedure. > > You'll have to think how this works: realloc has to copy the data to > the newly allocated region if it is not able to extend the current > region. > > If you do that for each time you'll increase the allocation size and > your size increment is small, you'll do a lot of work. The way you are > doing it now has quadratic time complexity. > > So make your increment larger and start with a larger size. Maybe you > can estimate the initial size based on your file size. If you'll end > up allocating a too large area, just use realloc the decrease the size > after you're done. > > Also thing about other data structures: you might be better of with a > linked list or tree here, depending on what you are doing with your > data. > > -Otto
Also, if the purpose of this is to load the file in memory, why not just mmap() it in the first place ?