Hi,
>> And FILES are:
>> DIC: full expanded dictionary file
>> CRP: training text corpus file
>> TSX: tagger specification file, in XML format
>> TAGGER_DATA: tagger data file, built in the training and used while
>> tagging
>> HTAG: hand-tagged text corpus
>> UNTAG: untagged text corpus, morphological analysis of HTAG
>> corpus to use both jointly with -s option
>>
>>
>> For Hungarian, "DIC" is not going to be possible as it relies on
>> dictionary expansion,[1] the rest is possible (you just need to convert
>> the resources you already have).
>>
>> Felipe: What is the dictionary expansion file used for when training the
>> tagger, and could it be approximated in some way?
>>
>> Fran
>>
>> 1. Well, you could just analyse the corpus with your morphological
>> analyser, and then convert the set of analyses from the corpus to an
>> Apertium .dix file, then expand it. This would be useless for most
>> purposes but would allow you to train the tagger.
>>
>> We are one step closer now, but just wondering if there is any easy way
to create a .dix file from Apertium stream format. (or any easy way to use
an analysed text file for the tagger, instead of enumerating lemmata and
paradigms.)
Thanks,
Gyorgy
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff