At 21:46 -0800 11/11/11, Sumtingwong wrote:
All of the files to be searched contain paragraphs of text that is
soft wrapped (I don't know if that is the correct term, sorry). I
have not written any Perl in over 10 years, time to break out the
books!
Wrapping will make no difference
What is needed? ;-) A frequency count of each word in the input file
for each file that was searched. For example, the first word of the
input file is "it". Document one is searched for "it" and it shows up
248 times. Optimal output would be (in tabbed columns):
it Document 1 248
I know the output is going to be huge (as the input file is rather
large), but that is fine--I just need to get to the analysis part at
this point.
The script below should do what you want. All you need to do is set
up the files and directories. The script prints to results to
STDOUT. You will probably want to write to an output file.
#!/usr/bin/perl
use strict;
chdir "$ENV{HOME}/hanfiles/"; # directory containing files
my @filelist = <*>; # list of files in the directory
# my $wordlist = "path/to/wordlist";
# open my $filehandle, $wordlist or die $! ...etc
while(<DATA>){ # replace this with $filehandle
chomp;
my $searchword = $_;
foreach my $filename (@filelist) {
my $count = 0;
my @found;
open my $fh, $filename or die $!;
while (<$fh>){
@found = /$searchword/ig;
$count += scalar @found;
}
if ($count){
print "File: '$filename'\tWord: '$searchword'\tFound: $count \n";
}
}
}
__DATA__
東
方
紅
太
陽
升
__END__
--
You received this message because you are subscribed to the
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
<http://groups.google.com/group/bbedit?hl=en>
If you have a feature request or would like to report a problem,
please email "[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>