Hi Daniel

The English and the German tagger work quite differently: the English one is trained on a corpus (not by me, it's done by OpenNLP) and uses context to decide which tag to assign to a word.

It is the one Myriam tested and the conclusion was that the tagged corpus for training is quite hard to obtain:(

 The German tagger doesn't use
context but assigns all possible tags to a word.

For example, "das Haus" (the house) is a correct phrase because "das" is neutrum/singular/nominativ, "Haus" is neutrum/singular/nominativ too. "des Haus" is incorrect because "des" is neutrum/singular/genitiv. In other words, you need at least one reading to match in gender, number, and case. So you need a large list of words with all their morphological information.


This is THE approach
That's really great and i think we have all the french material tools
i'll have a look


last point, why did you use Java :( Python is great and your first
approach was perfect for multiple use (not only dedicated to OOo)


But my first approach also quickly became un-maintainable even for myself. Now the code has a better structure, more unit test coverage, can be build with an ant script and is easy to work with in Eclipse. I really need Java's type-safety and the power of Eclipse to work effectively.


Yes Eclipse seems to be great :)
I will have a look. But i really think that python is possible

Btw, regarding OOo API, is your UI written with UNO API or is it a pure Java window ?

Laurent

--
Laurent Godard <[EMAIL PROTECTED]> - Ingénierie OpenOffice.org
Indesko >> http://www.indesko.com
Nuxeo CPS >> http://www.nuxeo.com - http://www.cps-project.org
Livre "Programmation OpenOffice.org", Eyrolles 2004

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to