Hi all,
I'm currently working on a word stemming engine for the assp - Bayesian
check. This engine converts words to its stem from, for example plural,
sigular,future,present,past ....
The Perl module 'Lingua::Stem' is used to do this.
Currently supported languages by this module are :
DA - Danish
DE - German
EN - English (also EN-US und EN-UK)
FR - French
GL - Galician
IT - Italian
NO - Norwegian
PT - Portuguese
RU - Russian (also RU-RU und RU-RU.KOI8-R)
SV - Swedish
It would be nice, if this assp stemming engine could detect in which
language the text to convert is written. Currently a default has to be set
in the code.
- For 'EN' the detection is still the occurency of any of these words:
/\b(?:are|your?|she|here|his|he|there|this|these|have|has|the|those)\b/io
- For 'DE' I'll find any similiar - no problem
What I need - is a small list of common language unique(!!!) words for the
other languages. Any help is welcome.
Thomas
DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************
------------------------------------------------------------------------------
Got Input? Slashdot Needs You.
Take our quick survey online. Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test