Justin Wyllie wrote:
...
$file_handle->read($s, $length); #$s is about 1/2 Mb
@data = unpack($format , $s);
##at this point memory usage jumps by 8 Mbs (measured using GTop->size() )
while (@data) {
push @data2, [shift @data, shift @data, shift @data] ; # this isn't exact
but it looks like each element of @data2 becomes a reference to a 3 element
array - i.e the binary data was stored in triplets
}
#this loop causes another jump of 4 Mbs
return \...@data2;
Mybe a naive question, but is $file_handle always pointing to the same
file ?
Then also, that whole logic above seems rather inefficient, both in
memory used and in overhead.
- each read() reads about 500K. So you use 500K right there.
- then these 500 K are "parsed" (by the unpack(), presumably in chunks
of a predictable size), into presumably many elements of @data. That
causes @data to be large. (Say each element is a 64-bit integer, encoded
as 8 bytes each; 500KB/8 = 64,000 elements in @data).
- then at each while iteration, @data is shifted 3 times, to extract 3
consecutive elements, creating a new 3-elements anonymous array. The
elements shifted out of @data are discarded. I would presume that Perl
is smarter than actually moving all remaining elements of @data each
time, but there is certainly some significant background work as a
result of each shift of @data.
- a reference to the 3-element array is then pushed onto @data2.
- then finally @data is discarded (or, at least, disregarded until the
next call). But the memory it used is never returned to the OS.
So if you would for instance reduce the size of each read(), you would
reduce the number of elements of @data that are produced at each
unpack(), thus keeping @data smaller, at the cost of more read()'s.
Then again, $s is a byte buffer. With the unpack, you are "chunking" it
by re-exploring the format $format over and over, building @data in the
process. But @data only serves to build these 3-element arrays to which
you want references to push into @data2. So why not unpack() the buffer
one 3-element chunk at a time, directly into a 3-element new array, and
do away with @data (and the shift()'s) altogether ?