> Doug Lentz <[EMAIL PROTECTED]> said:
> I've been using <FILEHANDLE> to read an entire text file into an array.
>
> @buffer = <MY_BIG_FILE>;
>
> I string-manipulate the individual array elements and then sometime
> later, do a
>
> $buffer = join "", @buffer;
>
> ...and this worked OK for a 80M text file. I couldn't resist and tried
> it out on
> a gigabyte monster.
>
> The script aborted with "Out of memory during request for 26 bytes
> during sbrk()".
>
> 26 bytes, coincidentally enough :), is the record size.
Not terribly surprising. The sbrk() is extending the process memory in 26
byte chunks as the join is performed.
As an "old school" programer who grew up in the days when memory was small and
expensive, I usually write programs which manipulate input for files like this
fragment;
while (<IN>) {
chomp;
$out_rec = some_function($_);
print OUT "$out_rec$/";
}
some_function() is just some code to manipulate the input record and produce
an output record. IN and OUT are assumed to already be open for reading and
writing respectively.
You could write this as:
@buffer = <IN>;
chomp(@buffer);
for (@buffer) {
$outrec = some_function($_);
push(@output, $outrec);
}
print OUT @buffer'
print OUT "$/";
Method 1 is far more efficient than method 2 as far as memory utilization is
concerned.
Method 2 can be easier to code depending on what some_function() does.
But method 2 may bump up against real memory constraints with very large input
files.
Which method is better?
My general rule of thumb is use method 1 in production code where the program
will be run many times and the file size is unknown. Use method 2 where the
script is a quick and dirty throw away and the coding time for method 1 is
much longer than method 2 or where the algorithm requires me to have the
entire input file in memory to compute the output.
--
Smoot Carl-Mitchell
Consultant
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]