Re: Large text search

John Delacour Sat, 12 Nov 2011 09:50:10 -0800

At 21:46 -0800 11/11/11, Sumtingwong wrote:

All of the files to be searched contain paragraphs of text that issoft wrapped (I don't know if that is the correct term, sorry). Ihave not written any Perl in over 10 years, time to break out thebooks!


Wrapping will make no difference

What is needed?  ;-)  A frequency count of each word in the input file
for each file that was searched.  For example, the first word of the
input file is "it".  Document one is searched for "it" and it shows up
248 times.  Optimal output would be (in tabbed columns):
it     Document 1     248

I know the output is going to be huge (as the input file is rather
large), but that is fine--I just need to get to the analysis part at
this point.

The script below should do what you want. All you need to do is setup the files and directories. The script prints to results toSTDOUT. You will probably want to write to an output file.



#!/usr/bin/perl
use strict;
chdir "$ENV{HOME}/hanfiles/"; # directory containing files
my @filelist = <*>; # list of files in the directory
# my $wordlist = "path/to/wordlist";
# open my $filehandle, $wordlist or die $! ...etc
while(<DATA>){ # replace this with $filehandle
  chomp;
  my $searchword = $_;
  foreach my $filename (@filelist) {
    my $count = 0;
    my @found;
    open my $fh, $filename or die $!;
    while (<$fh>){
      @found = /$searchword/ig;
      $count += scalar @found;
    }
    if ($count){
      print "File: '$filename'\tWord: '$searchword'\tFound: $count \n";
    }
  }
}
__DATA__
東
方
紅
太
陽
升

__END__

--

You received this message because you are subscribed to the"BBEdit Talk" discussion group on Google Groups.

To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
<http://groups.google.com/group/bbedit?hl=en>

If you have a feature request or would like to report a problem,please email "[email protected]" rather than posting to the group.

Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>

Re: Large text search

Reply via email to