Hi Daniel
The English and the German tagger work quite differently: the English one
is trained on a corpus (not by me, it's done by OpenNLP) and uses context
to decide which tag to assign to a word.
It is the one Myriam tested and the conclusion was that the tagged
corpus for training is quite hard to obtain:(
The German tagger doesn't use
context but assigns all possible tags to a word.
For example, "das Haus" (the house) is a correct phrase because "das" is
neutrum/singular/nominativ, "Haus" is neutrum/singular/nominativ too. "des
Haus" is incorrect because "des" is neutrum/singular/genitiv. In other
words, you need at least one reading to match in gender, number, and case.
So you need a large list of words with all their morphological
information.
This is THE approach
That's really great and i think we have all the french material tools
i'll have a look
last point, why did you use Java :( Python is great and your first
approach was perfect for multiple use (not only dedicated to OOo)
But my first approach also quickly became un-maintainable even for myself.
Now the code has a better structure, more unit test coverage, can be build
with an ant script and is easy to work with in Eclipse. I really need
Java's type-safety and the power of Eclipse to work effectively.
Yes Eclipse seems to be great :)
I will have a look. But i really think that python is possible
Btw, regarding OOo API, is your UI written with UNO API or is it a pure
Java window ?
Laurent
--
Laurent Godard <[EMAIL PROTECTED]> - Ingénierie OpenOffice.org
Indesko >> http://www.indesko.com
Nuxeo CPS >> http://www.nuxeo.com - http://www.cps-project.org
Livre "Programmation OpenOffice.org", Eyrolles 2004
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]