Hi all,

I'm currently working on a word stemming engine for the assp - Bayesian 
check. This engine converts words to its stem from, for example plural, 
sigular,future,present,past ....

The Perl module 'Lingua::Stem' is used to do this.

Currently supported languages by this module are :

      DA          - Danish
      DE          - German
      EN          - English (also EN-US und EN-UK)
      FR          - French
      GL          - Galician
      IT          - Italian
      NO          - Norwegian
      PT          - Portuguese
      RU          - Russian (also RU-RU und RU-RU.KOI8-R)
      SV          - Swedish
 

It would be nice, if this assp stemming engine could detect in which 
language the text to convert is written. Currently a default has to be set 
in the code.

- For 'EN' the detection is still the occurency of any of these words: 
/\b(?:are|your?|she|here|his|he|there|this|these|have|has|the|those)\b/io
- For 'DE' I'll find any similiar - no problem

What I need - is a small list of common language unique(!!!) words for the 
other languages. Any help is welcome.

Thomas



DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to