El dl 05 de 05 de 2014 a les 13:59 +0200, en/na David Chan va escriure:
> Hello,
> 
> 
> I'd like to get Apertium to translate some text, then output info
> about the way in which the words were reordered. For example, in Moses
> you can get output like this:
> 
> 
>   $ echo 'ein haus ist das' | moses -t
>   this |3-3| is |2-2| a |0-0| house |1-1|
> 
> 
> This tells you, for instance, that the target word 'this' came from
> the word 'das' in the source text.
> 
> 
> Can I get anything similar from Apertium? (By writing some code if
> necessary -- in which case, how much work would it be?)

Because of the way that Apertium works, this is non-trivial. It should
however be possible to output. You could either do it by editting the
rules and outputting in a superblank, or by editting the transfer code
directly. 

Part of the problem is that for pairs with >1 level of chunking it may
be chunks that are moved around and not words, so the order after the
first transfer level may not be the be-all-and-end-all. 

If you choose the superblank option, then you will find that in some
pairs the blanks are reordered and in others not:

You can do a simple experiment like:

$ cat /tmp/nums.py 
import sys
c = 0;
for i in sys.stdin.readlines(): 
        for j in i.split(' '):
                sys.stdout.write('[' + str(c) + ']' + j + ' ');
                c = c + 1;


$ echo "El gato negro caminaba en la playa." | python3 /tmp/nums.py 
[0]El [1]gato [2]negro [3]caminaba [4]en [5]la [6]playa.

Then run it through Apertium: 

$ echo "El gato negro caminaba en la playa." | python3 /tmp/nums.py  |
apertium -f none -d ~/source/apertium/trunk/apertium-en-es/ es-en
[0]The[1] black[2] cat [3]walked[4] in[5] the[6] beach.

$ echo "Neskak katua ikusten ari da." | python3 /tmp/nums.py   |
apertium -f none eu-es
[0]La chica [1]el gato [2]viendo [3]estar[4] es.

(note here 'Neskak' translates as 'La chica' and 'katua' as 'el gato')

But if you go through the transfer rules, and replace the <b pos="N"/>
in the rules with the in-sequence <b pos="N"/> then it should have the
desired effect.

Anyway, it is fraught with difficulties and you may have strange
results. It might be better to just run an alignment algorithm post-hoc
to find out what should align with what. 

I think this is what Pankaj (in CC) is doing in fact. Pankaj can you
tell us more? 

Also, David, could you tell us more about what you'd like to use this
for ? e.g. the final application ?

F.


------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to