On Sat, Mar 08, 2014 at 10:59:22AM -0500, Steve Tolkin wrote:
# return code: 0 == success; 1 == some warnings; 2 == some errors
my $rc = 0;
This value never changes. I assume the larger program could change it.
my $split_char=','; # CHANGE ME IF NEEDED (later use getopts)
my @aoh; # array of hashes: key is the field value; value is the count
my $numlines = 0;
my $firstline;
while (<>)
{
chomp;
$firstline = $_ if $numlines == 0;
More idiomatically written as:
$firstline ||= $_;
$numlines++;
my @data;
# Seemingly extra code below is for compatibility with older perl
versions
# But this might not be needed anymore.
if ( $split_char ne '' )
{ @data = split(/$split_char/,$_); }
else
{ @data = split; }
If you're really looking for optimizations, this test can be moved outside
the loop. Since no arg split splits on spaces, your check can be something
like:
$split_char = ' ' unless $split_char;
Then the test and two split options can be replaced with
@data = split /$split_char/, $_;
There may also be some benefit to hoisting the regular expression
outside the loop:
my $re = qr/$split_char/o;
...
@data = split $re, $_;
If there's any, it will be tiny, but may be appreciable given your
input size.
# Populate array of hashes. (This uses perl "autovivification"
feature.)
for (my $i = 0; $i < scalar @data; $i++) {
$aoh[$i]{$data[$i]}++;
}
}
Nit: "scalar @data" can be replaced with @data.
# print output
print "filename: $ARGV\n"; # This writes a "-" if data is piped in
if ($numlines >0) {
print "firstline: $firstline\n";
}
print "numlines: $numlines\n";
for (my $i = 0; $i < scalar @aoh; $i++) {
# The number of keys in a hash is the "count distinct" number
print "field#: ", $i, ", distinct: ", scalar keys %{$aoh[$i]}, "\n";
Nit: This reads better as a printf, I think.
printf "field#: %d, distinct: %d\n", $i, scalar keys %{$aoh[$i]};
}
exit $rc;
This is always 0, as noted above.
My initial thought at improvement was to avoid the split and walk
through each
line looking for a $split_char or a "\n", but that just duplicates split in
perl instead of C. I think you've got just about the fastest program in perl.
For fun, I wrote a version in Go and it's twice as fast as the perl
version. I imagine a C version would be faster yet, but I get paid for that
kind of fun. I'd be happy to send you the Go version if you're interested.
-Gyepi
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm