Hello,

I'm working with the monolingual transfer rule learning code and have a few
questions:

1. I see some language pairs used to have a multi mode (such as in this old
version of eng-cat
<https://github.com/apertium/apertium-eng-cat/blob/6a2f3e7082a9d44a478f76f6c60a526e02512e93/modes.xml#L74>).
They also used to have "poly" dictionaries (such as this one
<https://github.com/apertium/apertium-eng-cat/blob/6a2f3e7082a9d44a478f76f6c60a526e02512e93/apertium-en-ca.en-ca-poly.dix>).
These files seem necessary for the latest monolingual rule learning script
I've found
<http://wiki.apertium.org/wiki/Generating_lexical-selection_rules_from_monolingual_corpora>.
Why do language pairs no longer have a multi mode or poly dictionaries?

2. Is there a script that can generate a poly dictionary from a bilingual
dictionary?

3.  The third step in the monolingual rule learning script I linked above
says this should be ran:

cat europarl.en-es.es.tagged | ~/source/apertium-lex-tools/multitrans
~/source/apertium-en-es/en-es.autobil -m -f -t -n >
europarl.en-es.es.multi-trimmed
I was trying to do this step with the apertium-en-pt language pair
using 10% of the English-Portuguese
Europarl corpus. I stopped the program because the output file was
getting really big (dozens of
gigabytes). Is this expected behavior from ./multitrans with the -m
option? If so, how are the
English-Spanish Europarl examples run?

Thank you,
Danielle
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to