Working on script using File::Find to count the number of news posts in a semi-extensive hierarchy. As some may know, news posts are commonly stored in numeric named files, one file per posting.
The following script tries to plow thru a hierarchy returning the directory name and file count for that directory There is a litte more involving stripping down longish file names for printing. But only a certain recognizable pattern, so that the script could be used against even just 1 directory, with no stripping happening unless it matches a certain pattern. Cutting to the chase, it seems to do the job I wanted quite well and pretty fast too. I chked file counts for specific directories several times in different iterations of this script... and am satisfied it is returning accurate results. I've cut a number of lines of code that involved passing a directory name and checking if it is a directory name on the file system to simplify the script. Now to the question: The script seems to fail in a certain way when used against a small hierarchy devised for testing. However I do not see any wrong output when used against a real news hierarchy ------- ------- ---=--- ------- ------- script: [...] use strict; use warnings; use File::Find; my $startdir = '/home/gnusu/News/agent/nntp'; # my $startdir = './dir1'; my $oacnt = 0; my $dcnt = 0; my $mcnt = 0; my (%data,@out,$ffd,$stripped); my @tst; my $gpcnt = 0; find sub { return unless /^\d+$/; $oacnt++; $mcnt++; ## Only push after the count has been collected if ($dcnt && $ffd ne $File::Find::dir) { push @out, sprintf"%-55s %6d", $stripped, $mcnt; } ## Get every uniq directory name if ($data{$File::Find::dir}++ == 0) { $mcnt = 0; $dcnt++; ## shorten up the path names $ffd = $File::Find::dir; if ($ffd =~ /.*News\/agent\/nntp/) { ($stripped) = $ffd =~ m/.*News\/agent\/nntp\/(.*)/; }else { $stripped = $ffd; } } }, $startdir; ## only push after the count is done. No more directory names ## means the count cannot be added inside find() push @out, sprintf"%-55s %6d", $stripped, $mcnt; ## one count seems to end up missing so adding it here $mcnt += 1; my $gcnt = 0; for (sort @out) { $gcnt++; printf "%2d: %s\n", $gcnt, $_; } print "\n<$oacnt> posts in <$gcnt> directories overall\n"; ------- ------- ---=--- ------- ------- using the script on real news hierarchy it seems to return accurate results . A few lines of output against stored news at: /home/gnusu/News/agent/nntp Showing only three lines of otuput from a list of 44 directories. First, middle and last lines: 1: enews.newsguy.com/alt/solaris/x86 18629 [...] 22: news.gmane.org/gmane/comp/terminal-emulators/tmux/user 8324 [...] 44: nntp.perl.org/perl/perl6/users 3967 The counts are accurate ------- ------- ---=--- ------- ------- Now the problem test directories; 3 stacked directories with 1 numeric file in each It looks like: ls -R ./dir1 ./dir1: 111 dir2 ./dir1/dir2: 222 dir3 ./dir1/dir2/dir3: 333 script output on those three: (shortened space between dir name and count to prevent mail wrapping) 1: ./dir1 1 2: ./dir1/dir2 1 3: ./dir1/dir2/dir3 0 Notice the last directory shows a count of zero. Why is that, and how to prevent it? -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/