On Sonntag 18 Dezember 2005 17:45, Laurent Godard wrote:

Hi Laurent,

> Btw, if i undertsand correctly, you need an external tagger
> Are your taggers still based on know tagged text ?

The English and the German tagger work quite differently: the English one 
is trained on a corpus (not by me, it's done by OpenNLP) and uses context 
to decide which tag to assign to a word. The German tagger doesn't use 
context but assigns all possible tags to a word.

For example, "das Haus" (the house) is a correct phrase because "das" is 
neutrum/singular/nominativ, "Haus" is neutrum/singular/nominativ too. "des 
Haus" is incorrect because "des" is neutrum/singular/genitiv. In other 
words, you need at least one reading to match in gender, number, and case. 
So you need a large list of words with all their morphological 
information.

> last point, why did you use Java :( Python is great and your first
> approach was perfect for multiple use (not only dedicated to OOo)

But my first approach also quickly became un-maintainable even for myself. 
Now the code has a better structure, more unit test coverage, can be build 
with an ant script and is easy to work with in Eclipse. I really need 
Java's type-safety and the power of Eclipse to work effectively.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to