On Mon, Nov 25, 2002 at 03:52:21PM -0800, Jacob Schroeder wrote:
> Here's the main chunk of my code that start the text coming in...
>     # Build up the command string appropriately, depending on what options
>     # have been set.
>     my $command =
>       ($rlog_module ne "") ? "cvs -n -d $cvsdir rlog $rlog_module" : "cvs
> log";
>     print "Executing \"$command\"\n" if $debug;
> 
>     open (CVSLOG, "$command |") || die "Couldn't execute \"$command\"";
>     while (<CVSLOG>)
>     {
>     ....
>     ....
>     }
> 
> If you can't see what I'm doing, I'm parsing the return of cvs rlog

I can't see what you're doing because you stubbed out the all important
part: the bit inside the loop.


> from that, however, when I run this, I get an "Out Of Memory" error as I'm
> parsing the text.  Is this because I'm using hashes or because there is just
> a lot of text for cvs rlog on the root?  I get the "Out Of Memory" error
> after it runs for like 30 minutes or so and if I watch the process it
> usually gives me that error once it is using about 20 MB of memory.

It takes your program 30 minutes to parse a CVS log?  That's a strong
indication that something is Very Wrong with your parser.  How big is this
log and how slow is this computer?


Below is a little thing I whipped together to parse CVS logs into a data
structure.  I ran it against the change logs for MakeMaker which is about
9000+ lines over 70+ files and 1000+ revisions.  Took 1 second, uses 4 megs.

The trick of interest is I didn't attempt to do it line by line but rather
record by record.  This makes it much easier to parse since the different
types of parsing (records vs headers and foreach file) can be cleanly
seperated.


#!/usr/bin/perl

open(CVSLOG, "cvs log |") or die "Can't run cvs log: $!";

my %files = ();

# One "line" == One file's logs
local $/ = 
'=============================================================================';

while(<CVSLOG>) {
    next unless /\S/;
    my($head, @revisions) = split /^----------------------------$/m, $_;

    my($headers) = parse_head($head);
    
    my $file = $headers->{'RCS file'};
    $files{$file} = $headers;

    foreach my $revision (@revisions) {
        my($rev, $info, $log) = parse_revision($revision);
        $files{$file}{revisions}{$rev} = $info;
        $files{$file}{revisions}{$rev}{log} = $log;
    }
}

my $tot_revisions = 0;
$tot_revisions += keys %{$files{$_}{revisions}} for (keys %files);
printf "%d files found with %d revisions\n", 
  scalar keys %files, $tot_revisions;


sub parse_head {
    my $head = shift;

    my %headers = ();
    my $curr_header;

    foreach (split /\n/, $head) {
        next if /^\s*$/ and !defined $curr_header;

        if( my($heads) = parse_line($_) ) {
            @headers{keys %$heads} = values %$heads;
            $curr_header = (keys %headers)[0] if keys %headers == 1;
        }
        elsif( my($v) = /^\s+(.*)$/ ) {
            $headers{$curr_header} .= "\n$v";
        }
        else {
            warn "I'm confused by this line:\n$_\n";
        }
    }

    return \%headers;
}


sub parse_line {
    my $line = shift;
    my %headers = ();

    if( $line =~ /^\S[^:]*:/ ) {
        # watch out for two definitions on the same line
        foreach my $header (split /;\s*/, $line) {
            my($k, $v) = $header =~ /^(\S[^:]*): ?(.*)/;
            $headers{$k} = $v;
        }
    }

    return \%headers;
}

sub parse_revision {
    my($text) = shift;
    $text =~ s/^\s+//m;

    my(@lines) = split /\n/, $text;

    my($revision) = $lines[0] =~ /^revision (\S+)/;
    my($headers) = parse_line($lines[1]);
    my $log = $lines[2];

    return($revision, $headers, $log);
}



-- 

Michael G. Schwern   <[EMAIL PROTECTED]>    http://www.pobox.com/~schwern/
Perl Quality Assurance      <[EMAIL PROTECTED]>         Kwalitee Is Job One
You're smoother than a tunnel of shining sorrow.

Reply via email to