Re: [Apertium-stuff] apertium tagger usage

Orosz György Mon, 02 Apr 2012 04:35:20 -0700

Hi,


>> And FILES are:
>>  DIC:         full expanded dictionary file
>>  CRP:         training text corpus file
>>  TSX:         tagger specification file, in XML format
>>  TAGGER_DATA: tagger data file, built in the training and used while
>>               tagging
>>  HTAG:        hand-tagged text corpus
>>  UNTAG:       untagged text corpus, morphological analysis of HTAG
>>               corpus to use both jointly with -s option
>>
>>
>> For Hungarian, "DIC" is not going to be possible as it relies on
>> dictionary expansion,[1] the rest is possible (you just need to convert
>> the resources you already have).
>>
>> Felipe: What is the dictionary expansion file used for when training the
>> tagger, and could it be approximated in some way?
>>
>> Fran
>>
>> 1. Well, you could just analyse the corpus with your morphological
>> analyser, and then convert the set of analyses from the corpus to an
>> Apertium .dix file, then expand it. This would be useless for most
>> purposes but would allow you to train the tagger.
>>
>> We are one step closer now, but just wondering if there is any easy way
to create a .dix file from Apertium stream format. (or any easy way to use
an analysed text file for the tagger, instead of enumerating lemmata and
paradigms.)
Thanks,
Gyorgy

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] apertium tagger usage

Reply via email to