I'd rather have a final word from Sergio or other main Apertium developers, but I think that what Fran suggests does not hurt the current workflow of Apertium and may be useful in cases like the one he's trying to work on.

Mikel

On 08/01/2011 03:56 PM, Francis Tyers wrote:
El dl 01 de 08 de 2011 a les 11:35 +0000, en/na Francis Tyers va
escriure:
Hello everyone

At the moment, apertium-pretransfer accepts the output of the tagger
(without surface forms) and splits MLUs (joined with '+') into two.

As I'm working with output from the tagger with surface forms, it would
be useful to have a mode to pretransfer to do this, but also strip the
surface form too.

So instead of:

  in: ^per<pr>+el<det><def><m><pl>$
  out: ^per<pr>$ ^el<det><def><m><pl>$

it would be

  in: ^pels/per<pr>+el<det><def><m><pl>$
  out: ^per<pr>$ ^el<det><def><m><pl>$

I suggest calling the option -n (this is the same as the cg-proc option
of the same function --no-word-forms)

Any objections ?
Here is the patch. I've also taken the liberty of adding '~' as a
compound word boundary, something that myself and Unhammer have been
thinking of doing for a while. The '~' symbol has not yet been used
anywhere in analysis (only in generation).

Now it will work the same as '+' only no space will be output. Here are
some examples:

$ echo '^de<pr>+el<det><def><m><sg>$' | apertium-pretransfer
^de<pr>$ ^el<det><def><m><sg>$

$ echo '^del/de<pr>+el<det><def><m><sg>$' | apertium-pretransfer -n
^de<pr>$ ^el<det><def><m><sg>$

$ echo '^arbeidsmiljø<n><nt><sg><ind>~lov<n><m><sg><def>$' |
apertium-pretransfer
^arbeidsmiljø<n><nt><sg><ind>$^lov<n><m><sg><def>$

Note, there is an outstanding "bug"(?) with pretransfer where the
multiword queue gets appended to the first part of joined analyses, not
the second:

$ echo '^arbeidsmiljø<n><nt><sg><ind>+lov<n><m><sg><def># plan$' |
apertium-pretransfer
^arbeidsmiljø# plan<n><nt><sg><ind>$ ^lov<n><m><sg><def>$

If there is a joined analysis with multiword queue, should it go on the
first or last part of the join ?

Fran


------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey


_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


--
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Informàtics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326

------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts. 
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos & much more. Register early & save!
http://p.sf.net/sfu/rim-blackberry-1
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to