I have a perl script that I use to parse a month's worth of Cisco PIX
Firewall logs sent to a syslog server.
The raw text file is always over 300 MB and typically closer to 750 MB
or higher.
Opening a regular file handle (eg open(LOGFILE,"<Syslog-PIX.log") was
beating the system into the ground.
It looked very much like perl was trying to keep the entire text file
open in memory while working with it.
Expanding it to:
open(LOGFILE,"<Syslog-PIX.log");
@logFile = <LOGFILE>;
close(LOGFILE);
did not help the memory usage either.
The solution I ended up using was to chop the text file up into a bunch
of smaller arbitrary byte ranges and treating it like a stream instead
of a text file.
Sort of like this:
my $filesize = -s "Syslog-PIX.log";
my $start = 0;
my $finish = $start + 1024 * 1024 * 2;
syseek(LOGFILE,0,0);
while ($bytesread = sysread(LOGFILE,$temp_array_string,1024 * 1024 * 2)
{
if ($finish > $filesize) {
$finish = $filesize;
}
my @temp_array = split(/\n/,$temp_array_string);
@temp_array = @temp_array[1 .. $#temp_array-1];
my $realfinish = $start + $bytesread; # usually does not equal 1024 *
1024 * 2, but is close
foreach $line (@temp_array) {
<perform some actions to normalize the data>
push(@clean_array,$line);
}
$start = $start + $bytesread;
$finish = $start + $bytesread;
sysseek(LOGFILE,$start,SEEK_CUR);
}
This leads to data loss since some lines are broken in the middle and
are not counted.
Since the file is so large and usually holds from 2 million to 3 million
lines, the lossage seems minimal. But I am no statistician, so I cannot
say that with certainty.
Is there a better, more proper way to handle huge text files that does
not require them to be read entirely into memory and does not cause data
loss? Please advise.
-Jason
PS - I apologize in advance for the legal disclaimer at the bottom of my
email message. This is tacked on by our SMTP gateway and I have no
control over it.
--
------------------------------------------------------------------------------
Confidentiality notice:
This e-mail message, including any attachments, may contain legally privileged
and/or confidential
information. If you are not the intended recipient(s), or the employee or agent
responsible for delivery
of this message to the intended recipient(s), you are hereby notified that any
dissemination,
distribution, or copying of this e-mail message is strictly prohibited. If you
have received this message
in error, please immediately notify the sender and delete this e-mail message
from your computer.
==============================================================================
_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs