Working on script using File::Find to count the number of news posts
in a semi-extensive hierarchy.  As some may know, news posts are
commonly stored in numeric named files, one file per posting.

The following script tries to plow thru a hierarchy returning the
directory name and file count for that directory

There is a litte more involving stripping down longish file names for
printing.  But only a certain recognizable pattern, so that the script
could be used against even just 1 directory, with no stripping
happening unless it matches a certain pattern.

Cutting to the chase, it seems to do the job I wanted quite well and
pretty fast too.  I chked file counts for specific directories several
times in different iterations of this script... and am satisfied it is
returning accurate results.

I've cut a number of lines of code that involved passing a directory
name and checking if it is a directory name on the file system to
simplify the script.

Now to the question:
The script seems to fail in a certain way when used against a small
hierarchy devised for testing.

However I do not see any wrong output when used against a real news
hierarchy

-------       -------       ---=---       -------       -------

script:

[...]

use strict;
use warnings;
use File::Find;

my $startdir = '/home/gnusu/News/agent/nntp';
# my $startdir = './dir1';
my $oacnt = 0;
my $dcnt = 0;
my $mcnt = 0;
my (%data,@out,$ffd,$stripped);
my @tst;
my $gpcnt = 0;

find sub {
  return unless /^\d+$/;
  $oacnt++;
  $mcnt++;

  ## Only push after the count has been collected
  if ($dcnt && $ffd ne $File::Find::dir) {
    push  @out, sprintf"%-55s %6d", $stripped, $mcnt;
  }

  ## Get every uniq directory name
  if ($data{$File::Find::dir}++ == 0) {
    $mcnt = 0;
    $dcnt++;

    ## shorten up the path names
    $ffd = $File::Find::dir;
    if ($ffd =~ /.*News\/agent\/nntp/) {
      ($stripped) = $ffd =~ m/.*News\/agent\/nntp\/(.*)/;
    }else {
      $stripped = $ffd;
    }
  }
}, $startdir;

## only push after the count is done. No more directory names
## means the count cannot be added inside find()
push  @out, sprintf"%-55s %6d", $stripped, $mcnt;

## one count seems to end up missing so adding it here
$mcnt += 1;

my $gcnt = 0;
for (sort @out) {
  $gcnt++;
  printf "%2d: %s\n", $gcnt, $_;
}

print "\n<$oacnt> posts in <$gcnt> directories overall\n";

-------       -------       ---=---       -------       ------- 

using the script on real news hierarchy it seems to return accurate
results .  A few lines of output against stored news at:

/home/gnusu/News/agent/nntp

Showing only three lines of otuput  from a list of 44 directories.

First, middle and last lines:

 1: enews.newsguy.com/alt/solaris/x86                        18629

[...] 

22: news.gmane.org/gmane/comp/terminal-emulators/tmux/user    8324

[...]

44: nntp.perl.org/perl/perl6/users                            3967

The counts are accurate
-------       -------       ---=---       -------       ------- 

Now the problem test directories; 3 stacked directories with 1 numeric
file in each

It looks like:
   ls -R ./dir1

   ./dir1:
   111  dir2

   ./dir1/dir2:
   222  dir3

   ./dir1/dir2/dir3:
   333

script output on those three:

(shortened space between dir name and count to prevent mail wrapping)

 1: ./dir1                        1
 2: ./dir1/dir2                   1
 3: ./dir1/dir2/dir3              0

Notice the last directory shows a count of zero.
Why is that, and how to prevent it?

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to