Justin Wyllie wrote:
...


$file_handle->read($s, $length); #$s is  about 1/2 Mb
@data = unpack($format , $s);
##at this point memory usage jumps by 8 Mbs (measured using GTop->size() )

while (@data) {
push @data2, [shift @data, shift @data, shift @data] ; # this isn't exact but it looks like each element of @data2 becomes a reference to a 3 element array - i.e the binary data was stored in triplets
}
#this loop causes another jump of 4 Mbs

return \...@data2;

Mybe a naive question, but is $file_handle always pointing to the same file ?

Then also, that whole logic above seems rather inefficient, both in memory used and in overhead.
- each read() reads about 500K. So you use 500K right there.
- then these 500 K are "parsed" (by the unpack(), presumably in chunks of a predictable size), into presumably many elements of @data. That causes @data to be large. (Say each element is a 64-bit integer, encoded as 8 bytes each; 500KB/8 = 64,000 elements in @data). - then at each while iteration, @data is shifted 3 times, to extract 3 consecutive elements, creating a new 3-elements anonymous array. The elements shifted out of @data are discarded. I would presume that Perl is smarter than actually moving all remaining elements of @data each time, but there is certainly some significant background work as a result of each shift of @data.
- a reference to the 3-element array is then pushed onto @data2.
- then finally @data is discarded (or, at least, disregarded until the next call). But the memory it used is never returned to the OS.

So if you would for instance reduce the size of each read(), you would reduce the number of elements of @data that are produced at each unpack(), thus keeping @data smaller, at the cost of more read()'s.

Then again, $s is a byte buffer. With the unpack, you are "chunking" it by re-exploring the format $format over and over, building @data in the process. But @data only serves to build these 3-element arrays to which you want references to push into @data2. So why not unpack() the buffer one 3-element chunk at a time, directly into a 3-element new array, and do away with @data (and the shift()'s) altogether ?

Reply via email to