Howdy, A small little itch I've been scratching is to add the capability of showing a progress bar for sa-learn and spamassassin. So, now I've done that, it looks a little like this for sa-learn (note that the seperate lines are there just to show how it behaves, normally the progress bar would overwrite itself):
27% [============ ] 18.66 msgs/sec 01m22s LEFT 100% [============================================] 16.97 msgs/sec 01m57s DONE Learned from 2000 message(s) (2000 message(s) examined). 16% [======= ] 5.25 msgs/sec 06m08s LEFT 50% [====================== ] 3.55 msgs/sec 03m36s LEFT 73% [================================ ] 2.35 msgs/sec 01m55s LEFT 100% [============================================] 4.52 msgs/sec 07m22s DONE Learned from 2000 message(s) (2000 message(s) examined). And this is how it looks when run with spamassassin: spamassassin --progress --mbox hambucket1.mbox > blah.scanned 5% [= ] 0.32 msgs/sec 37m47s LEFT spamassassin --progress -L --mbox hambucket1.mbox > blah.scanned 100% [=============] 10.37 msgs/sec 03m12s DONE The smaller progress bar is due to the fact that Term::ReadKey is unable to determine the terminal size when redirecting STDOUT that way (if you know of a solution, speak up). If STDERR doesn't have a terminal (ie redirect to a file) then we'll behave very similar to --showdots, printing a . for each message processed and following up with a final status line: ....................................................................................................................... 100% Completed 113.51 msgs/sec in 00m17s Learned from 0 message(s) (2000 message(s) examined). In fact, I've contemplated replacing the --showdots functionality with this, only issue is that it behaves slightly differently. So, thoughts? Comments? It seems to work well, at least in all of the tests I've done. I'm gonna try and extend it just a bit to cover more functions (ie restore and sync). I'm including the diff for folks to look over. Michael
Index: sa-learn.raw
===================================================================
--- sa-learn.raw (revision 123727)
+++ sa-learn.raw (working copy)
@@ -25,6 +25,7 @@
use vars qw(
$spamtest %opt $isspam $forget
$messagecount $learnedcount $messagelimit
+ $progress $total_messages $init_results $start_time
$synconly $learnprob @targets $bayes_override_path
);
@@ -74,6 +75,7 @@
use Mail::SpamAssassin::ArchiveIterator;
use Mail::SpamAssassin::Message;
use Mail::SpamAssassin::PerMsgLearner;
+use Mail::SpamAssassin::Util::Progress;
###########################################################################
@@ -107,6 +109,7 @@
'local|L' => \$opt{'local'},
'no-sync|nosync' => \$opt{'nosync'},
'showdots' => \$opt{'showdots'},
+ 'progress' => \$opt{'progress'},
'use-ignores' => \$opt{'use-ignores'},
'no-rebuild|norebuild' => sub { $opt{'nosync'} = 1; warn "The --no-rebuild
option has been deprecated. Please use --no-sync instead.\n" },
@@ -155,6 +158,11 @@
$synconly = 1;
}
+if ($opt{'showdots'} && $opt{'progress'}) {
+ print "--showdots and --progress may not be used together, please select
just one.\n";
+ exit 0;
+}
+
if ( !defined $isspam
&& !defined $synconly
&& !defined $forget
@@ -383,15 +391,18 @@
}
);
- $iter->set_functions( \&wanted, sub { } );
+ $iter->set_functions(\&wanted, \&result);
$messagecount = 0;
$learnedcount = 0;
+ $init_results = 0;
+ $start_time = time;
+
eval { $iter->run(@targets); };
- print STDERR "\n" if ( $opt{showdots} );
- print
-"Learned from $learnedcount message(s) ($messagecount message(s) examined).\n";
+ print STDERR "\n" if ($opt{showdots});
+ $progress->final() if ($opt{progress});
+ print "Learned from $learnedcount message(s) ($messagecount message(s)
examined).\n";
# If we needed to make a tempfile, go delete it.
if ( defined $tempfile ) {
@@ -428,6 +439,29 @@
###########################################################################
+sub init_results {
+ $init_results = 1;
+
+ return unless $opt{'progress'};
+
+ $total_messages = $Mail::SpamAssassin::ArchiveIterator::MESSAGES;
+
+ $progress = Mail::SpamAssassin::Util::Progress->new({total =>
$total_messages,});
+}
+
+###########################################################################
+
+sub result {
+ my ($class, $result, $time) = @_;
+
+ # don't open results files until we get here to avoid overwriting files
+ &init_results if !$init_results;
+
+ $progress->update($messagecount) if $opt{progress};
+}
+
+###########################################################################
+
sub wanted {
my ( $class, $id, $time, $dataref ) = @_;
@@ -436,11 +470,12 @@
if ( defined($learnprob) ) {
if ( int( rand( 1 / $learnprob ) ) != 0 ) {
print STDERR '_' if ( $opt{showdots} );
- return;
+ return 1;
}
}
if ( defined($messagelimit) && $learnedcount > $messagelimit ) {
+ $progress->final() if $opt{progress};
die 'HITLIMIT';
}
@@ -471,6 +506,7 @@
undef $ma;
print STDERR '.' if ( $opt{showdots} );
+ return 1;
}
###########################################################################
@@ -514,6 +550,7 @@
--mbox Input sources are in mbox format
--mbx Input sources are in mbx format
--showdots Show progress using dots
+ --progress Show progress using progress bar
--no-sync Skip syncronizing the database and journal
after learning
-L, --local Operate locally, no network accesses
@@ -666,8 +703,14 @@
Read user score preferences from I<prefs> (usually
C<$HOME/.spamassassin/user_prefs>).
- =item B<-D>, B<--debug-level>
+=item B<--progress>
+Prints a progress bar (to STDERR) showing the current progress. In the case
+where no valid terminal is found this option will behave very much like the
+--showdots option.
+
+=item B<-D>, B<--debug-level>
+
Produce diagnostic output.
=item B<--no-sync>
Index: lib/Mail/SpamAssassin/Util/Progress.pm
===================================================================
--- lib/Mail/SpamAssassin/Util/Progress.pm (revision 0)
+++ lib/Mail/SpamAssassin/Util/Progress.pm (revision 0)
@@ -0,0 +1,277 @@
+# <@LICENSE>
+# Copyright 2004 Apache Software Foundation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# </@LICENSE>
+
+=head1 NAME
+
+ Mail::SpamAssassin::Util::Progress - Progress bar support for SpamAssassin
+
+=head1 SYNOPSIS
+
+ my $progress = Mail::SpamAssassin::Util::Progress->new({total => 100});
+
+ $msgcount = 0;
+ foreach my $message (@messages) {
+ # do something here
+ $msgcount++;
+ $progress->update($msgcount);
+ }
+
+ $progress->final();
+
+=head1 DESCRIPTION
+
+This module implements a progress bar for use in SpamAssassin scripts and
+modules. It allows you to create the progress bar, update it and print
+out the final results of a particular run.
+
+=cut
+
+package Mail::SpamAssassin::Util::Progress;
+
+use strict;
+use warnings;
+use bytes;
+
+use constant HAS_TERM_READKEY => eval { require Term::ReadKey };
+
+# Load Time::HiRes if it's available
+BEGIN {
+ eval { require Time::HiRes };
+ Time::HiRes->import( qw(time) ) unless $@;
+}
+
+=head2 new
+
+public class (Mail::SpamAssassin::Util::Progress) new (\% $args)
+
+Description:
+Creates a new Mail::SpamAssassin::Util::Progress object, valid values for
+the $args hashref are:
+
+=over 4
+
+=item total (required)
+
+The total number of messages expected to be processed. This item is
+required.
+
+=item fh [optional]
+
+An optional filehandle may be passed in, otherwise STDERR will be used by
+default.
+
+=item term [optional]
+
+The module will attempt to determine if a valid terminal exists on the
+filehandle. This item allows you to override that value.
+
+=back
+
+=cut
+
+sub new {
+ my ($class, $args) = @_;
+ $class = ref($class) || $class;
+
+ if (!exists($args->{total}) || $args->{total} < 1) {
+ warn "Must provide a total value > 1";
+ return undef;
+ }
+
+ my $self = {
+ 'total' => $args->{total},
+ 'fh' => $args->{fh} || \*STDERR,
+ };
+
+ bless ($self, $class);
+
+ $self->{term} = $args->{term} || -t $self->{fh};
+
+ $self->init_bar(); # this will give us the initial progress bar
+
+ return $self;
+}
+
+=head2 init_bar
+
+public instance () init_bar()
+
+Description:
+This method creates the initial progress bar and is called automatically from
new. In addition
+you can call init_bar on an existing object to reset the bar to it's original
state.
+
+=cut
+
+sub init_bar {
+ my ($self) = @_;
+
+ my $fh = $self->{fh};
+
+ $self->{prev_num_done} = 0; # 0 for now, maybe allow this to be passed in
+ $self->{num_done} = 0; # 0 for now, maybe allow this to be passed in
+
+ $self->{avg_msgs_per_sec} = undef;
+
+ $self->{start_time} = time();
+ $self->{prev_time} = $self->{start_time};
+
+ return unless ($self->{term});
+
+ my $term_size = 50; # default
+
+ # attempt to get a more valid value via Term::ReadKey
+ if (HAS_TERM_READKEY) {
+ my $term_readkey_term_size;
+ eval { $term_readkey_term_size =
(Term::ReadKey::GetTerminalSize($self->{fh}))[0] };
+ unless ($@) { # an error will just keep the default
+ # GetTerminalSize might have returned an empty array, so check the
+ # value and set if it exists, if not we keep the default
+ $term_size = $term_readkey_term_size if ($term_readkey_term_size);
+ }
+ }
+
+ # Adjust the bar size based on what all is going to print around it,
+ # do not forget the trailing space. Here is what we have to deal with
+ #1234567890123456789012345678901234567
+ # XXX% [] XXX.XX msgs/sec XXmXXs LEFT
+ # XXX% [] XXX.XX msgs/sec XXmXXs DONE
+ $self->{bar_size} = $term_size - 37;
+
+ my @chars = (' ') x $self->{bar_size};
+
+ print $fh sprintf("\r%3d%% [%s] %6.2f msgs/sec %sm%ss LEFT",
+ 0, join('', @chars), 0, '--', '--');
+
+ return;
+}
+
+=head2 update
+
+public instance () update ([Integer $num_done])
+
+Description:
+This method is what gets called to update the progress bar. You may
optionally pass in
+an integer value that indicates how many messages have been processed. If you
do not pass
+anything in then the num_done value will be incremented by one.
+
+=cut
+
+sub update {
+ my ($self, $num_done) = @_;
+
+ my $fh = $self->{fh};
+ my $time_now = time();
+
+ # If nothing is passed in to update assume we are adding one to the
prev_num_done value
+ unless(defined($num_done)) {
+ $num_done = $self->{prev_num_done} + 1;
+ }
+
+ my $msgs_since = $num_done - $self->{prev_num_done};
+ my $time_since = $time_now - $self->{prev_time};
+
+ # we have to have processed at least one message and moved a little time
+ if ($msgs_since > 0 && $time_since > .5) {
+
+ if ($self->{term}) {
+ my $percentage = $num_done != 0 ? int(($num_done / $self->{total}) *
100) : 0;
+
+ my @chars = (' ') x $self->{bar_size};
+ my $used_bar = $num_done * ($self->{bar_size} / $self->{total});
+ for (0..$used_bar-1) {
+ $chars[$_] = '=';
+ }
+ my $rate = $msgs_since/$time_since;
+ my $overall_rate = $num_done/($time_now-$self->{start_time});
+
+ # semi-complicated calculation here so that we get the avg msg per sec
over time
+ $self->{avg_msgs_per_sec} = defined($self->{avg_msgs_per_sec}) ?
+ 0.5 * $self->{avg_msgs_per_sec} + 0.5 * ($msgs_since / $time_since) :
$msgs_since / $time_since;
+
+ # using the overall_rate here seems to provide much smoother eta numbers
+ my $eta = ($self->{total} - $num_done)/$overall_rate;
+
+ # we make the assumption that we will never run > 1 hour, maybe this is
bad
+ my $min = int($eta/60) % 60;
+ my $sec = int($eta % 60);
+
+ print $fh sprintf("\r%3d%% [%s] %6.2f msgs/sec %02dm%02ds LEFT",
+ $percentage, join('', @chars),
$self->{avg_msgs_per_sec}, $min, $sec);
+ }
+ else { # we have no term, so fake it
+ print $fh '.' x $msgs_since;
+ }
+
+ $self->{prev_time} = $time_now;
+ $self->{prev_num_done} = $num_done;
+ }
+ $self->{num_done} = $num_done;
+ return;
+}
+
+=head2 final
+
+public instance () final ([Integer $num_done])
+
+Description:
+This method should be called once all processing has finished. It will print
out the final msgs per sec
+calculation and the total time taken. You can optionally pass in a num_done
value, otherwise it will use
+the value calculated from the last call to update.
+
+=cut
+
+sub final {
+ my ($self, $num_done) = @_;
+
+ # passing in $num_done is optional, and will most likely rarely be used,
+ # we should generally favor the data that has been passed in to update()
+ unless (defined($num_done)) {
+ $num_done = $self->{num_done};
+ }
+
+ my $fh = $self->{fh};
+
+ my $time_taken = time() - $self->{start_time};
+ $time_taken ||= 1; # can't have 0 time, so just make it 1 second
+
+ # in theory this should be 100% and the bar would be completely full, however
+ # there is a chance that we had an early exit so we aren't at 100%
+ my $percentage = $num_done != 0 ? int(($num_done / $self->{total}) * 100) :
0;
+
+ my $msgs_per_sec = $num_done / $time_taken;
+
+ my $min = int($time_taken/60) % 60;
+ my $sec = $time_taken % 60;
+
+ if ($self->{term}) {
+ my @chars = (' ') x $self->{bar_size};
+ my $used_bar = $num_done * ($self->{bar_size} / $self->{total});
+ for (0..$used_bar-1) {
+ $chars[$_] = '=';
+ }
+
+ print $fh sprintf("\r%3d%% [%s] %6.2f msgs/sec %02dm%02ds DONE\n",
+ $percentage, join('', @chars), $msgs_per_sec, $min, $sec);
+ }
+ else {
+ print $fh sprintf("\n%3d%% Completed %6.2f msgs/sec in %02dm%02ds\n",
+ $percentage, $msgs_per_sec, $min, $sec);
+ }
+
+ return;
+}
+
+1;
Index: spamassassin.raw
===================================================================
--- spamassassin.raw (revision 123727)
+++ spamassassin.raw (working copy)
@@ -77,8 +77,8 @@
use Pod::Usage;
use Mail::SpamAssassin;
use Mail::SpamAssassin::ArchiveIterator;
+use Mail::SpamAssassin::Util::Progress;
-
my %resphash = (
EX_OK => 0, # no problems
EX_USAGE => 64, # command line usage error
@@ -146,6 +146,9 @@
my $count = 0;
my @targets = ();
my $exitvalue;
+my $init_results = 0;
+my $progress;
+my $total_messages = 0;
# gnu_getopt is not available in Getopt::Long 2.24, see bug 732
# gnu_compat neither.
@@ -173,6 +176,7 @@
'revoke|k' => \$opt{'revoke'},
'siteconfigpath=s' => \$opt{'siteconfigpath'},
'test-mode|test|t' => \$opt{'test-mode'},
+ 'progress' => \$opt{'progress'},
'version|V' => \$opt{'version'},
'x' => sub { $opt{'create-prefs'} = 0
},
@@ -286,7 +290,7 @@
}
);
-$iter->set_functions( \&wanted, sub { } );
+$iter->set_functions(\&wanted, \&result);
# add leftover args as targets
# no arguments means they want stdin:
@@ -313,6 +317,8 @@
# Go run the messages!
eval { $iter->run(@targets); };
+$progress->final() if ($opt{progress});
+
# If we needed to make a tempfile, go delete it now.
if ( defined $tempfile ) {
unlink $tempfile;
@@ -339,6 +345,29 @@
###########################################################################
+sub init_results {
+ $init_results = 1;
+
+ return unless $opt{'progress'};
+
+ $total_messages = $Mail::SpamAssassin::ArchiveIterator::MESSAGES;
+
+ $progress = Mail::SpamAssassin::Util::Progress->new({total =>
$total_messages,});
+}
+
+###########################################################################
+
+sub result {
+ my ($class, $result, $time) = @_;
+
+ # don't open results files until we get here to avoid overwriting files
+ &init_results if !$init_results;
+
+ $progress->update($count) if $opt{progress};
+}
+
+###########################################################################
+
# make sure it only returns false values so that result_sub() isn't called...
sub wanted {
my $dataref = $_[3];
@@ -361,7 +390,7 @@
}
$mail->finish();
- return;
+ return 1;
}
# handle removing reports
@@ -371,7 +400,7 @@
if ( !$opt{'test-mode'} ) {
print $spamtest->remove_spamassassin_markup ($mail);
$mail->finish();
- return;
+ return 1;
}
else {
@@ -409,7 +438,7 @@
}
$new_mail->finish();
- return;
+ return 1;
}
# OK, do checks and put out the message.
@@ -430,7 +459,7 @@
$mail->finish();
$status->finish();
- return;
+ return 1;
}
# ---------------------------------------------------------------------------
@@ -478,6 +507,7 @@
--add-addr-to-whitelist=addr Add addr to whitelist (AWL)
--add-addr-to-blacklist=addr Add addr to blacklist (AWL)
--remove-addr-from-whitelist=addr Remove addr from whitelist (AWL)
+ --progress Print progress bar
-D, --debug [area=n,...] Print debugging messages
-V, --version Print version
-h, --help Print usage message
@@ -657,6 +687,13 @@
Read user score preferences from I<prefs> (usually
C<$HOME/.spamassassin/user_prefs>).
+=item B<--progress>
+
+Prints a progress bar (to STDERR) showing the current progress. This option
+will only be useful if you are redirecting STDOUT (and not STDERR). In the
+case where no valid terminal is found this option will behave very much like
+the --showdots option in other SpamAssassin programs.
+
=item B<-D> [I<area,...>], B<--debug> [I<area,...>]
Produce debugging output. If no areas are listed, all debugging information is
pgpd7VJXfxukR.pgp
Description: PGP signature
