I'd rather have a final word from Sergio or other main Apertium
developers, but I think that what Fran suggests does not hurt the
current workflow of Apertium and may be useful in cases like the one
he's trying to work on.
Mikel
On 08/01/2011 03:56 PM, Francis Tyers wrote:
El dl 01 de 08 de 2011 a les 11:35 +0000, en/na Francis Tyers va
escriure:
Hello everyone
At the moment, apertium-pretransfer accepts the output of the tagger
(without surface forms) and splits MLUs (joined with '+') into two.
As I'm working with output from the tagger with surface forms, it would
be useful to have a mode to pretransfer to do this, but also strip the
surface form too.
So instead of:
in: ^per<pr>+el<det><def><m><pl>$
out: ^per<pr>$ ^el<det><def><m><pl>$
it would be
in: ^pels/per<pr>+el<det><def><m><pl>$
out: ^per<pr>$ ^el<det><def><m><pl>$
I suggest calling the option -n (this is the same as the cg-proc option
of the same function --no-word-forms)
Any objections ?
Here is the patch. I've also taken the liberty of adding '~' as a
compound word boundary, something that myself and Unhammer have been
thinking of doing for a while. The '~' symbol has not yet been used
anywhere in analysis (only in generation).
Now it will work the same as '+' only no space will be output. Here are
some examples:
$ echo '^de<pr>+el<det><def><m><sg>$' | apertium-pretransfer
^de<pr>$ ^el<det><def><m><sg>$
$ echo '^del/de<pr>+el<det><def><m><sg>$' | apertium-pretransfer -n
^de<pr>$ ^el<det><def><m><sg>$
$ echo '^arbeidsmiljø<n><nt><sg><ind>~lov<n><m><sg><def>$' |
apertium-pretransfer
^arbeidsmiljø<n><nt><sg><ind>$^lov<n><m><sg><def>$
Note, there is an outstanding "bug"(?) with pretransfer where the
multiword queue gets appended to the first part of joined analyses, not
the second:
$ echo '^arbeidsmiljø<n><nt><sg><ind>+lov<n><m><sg><def># plan$' |
apertium-pretransfer
^arbeidsmiljø# plan<n><nt><sg><ind>$ ^lov<n><m><sg><def>$
If there is a joined analysis with multiword queue, should it go on the
first or last part of the join ?
Fran
------------------------------------------------------------------------------
Got Input? Slashdot Needs You.
Take our quick survey online. Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Informàtics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326
------------------------------------------------------------------------------
BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts.
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos & much more. Register early & save!
http://p.sf.net/sfu/rim-blackberry-1
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff