If you are attempting to "extract" keywords as a search engine might (i.e. find all words of substance), you might split the document by spaces (\s), loop through your array ignoring all non-words and unsubstantial words, etc., and incrementing / adding the corresponding hash element (rating / # of occurances), then doing whatever is appropriate with this information.I want to use Perl to extract keywords from plaintext, don't know whether there are some exsiting package / algorithm for doing that? Thank you.
Regards,
Robert.
Something like this should work if you are only searching local documents that don't have single words you might want to consider multiple words - otherwise, you could change \s to \W
foreach (split /\s+/, $document) {
unless (&badword) {
# a function to check if $_ is a common or "bad" word
if (exists $keywords{$_}) { $keywords{$_}++; }
else { $keywords{$_} = 1; }
}
}
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>