Dan Grossman wrote:
> 
> Hi,
> 
> I have a file of several million words, 80% of which are misspelled
> (i.e. non-words) and 20% of which are spelled correctly.  I would like
> to write a short script to read in this file and output only the
> correctly spelled words.
> 
> Since I'm on Unix, I could just use something like `spell` to spit out
> all the misspelled words in a more or less reasonable amount of time,
> but getting the _correctly_ spelled words seems to be a more difficult
> task (unless there's some command line option to `spell` that I don't
> know about).
> 
> It seems like it would take some time to invoke `spell` on each of a
> few million words.  So does anybody know of a Perl utility to check
> the local dictionary files more quickly?
> 
Can't you use /usr/share/dict words directly ?!
It's the list used by spell.

Well, of course you still have to reduce the words to their stem forms,
look at Lingua::Stem.

Best Wishes,
Andrea

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to