Hi,

it seems to be best to remember the location of the
XML markup, strip them out during translation and
re-insert them into the output. The exact location
of the markup can be determined with the phrase
and word alignment of the translation.

You could also just leave them in, but since
"<num>19</num>" is treated as a token, you may
want to inserted. But still, the tags may get reshuffled
by arbitrary preferences of the language model.

-phi

On Sat, Dec 17, 2011 at 2:38 AM, somayeh bakhshaei
<[email protected]> wrote:
> Hello,
>
> We intend to add XML tags to our corpus but we are not sure how the Moses
> decoder and SRILM uses these tags in training and decoding phase.
>
> For example if we tag 19 in main corpus like this:
> 19  ---> <num>19</num>
>
> How does LM must be made on this tagged corpus using SRILM?
> Does SRILM consider whether <num>  or <num>19</num> as a token?
>
> Also in decoding phase:
> How does moses pass the tagged tokens to the LM?
> For example if test is tagged like this:
> <num>19</num>
> Does it pass just <num> or whole of it as <num>19</num>
>
>
> ---------------------
> Best Regards,
> S.Bakhshaei
>
> After All you will come ....
> And will spread light on the dark desolate world!
> O' Kind Father! We will be waiting for your affectionate hands ...
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to