You shouldnt keep them: the & and ; would be tokenized and pollute
your sentences.

There are tools to convert them, at least a perl module I think,
search about html decoding. They are called html entities, not tags.


On Wed, Jul 24, 2013 at 2:16 PM, Cyrine NASRI <[email protected]> wrote:
> Hello,
>
> I use a training corpus to  build my translation system.
>
> But i founf in this corpus some HTML tags like for instance :
>
> "and i &apos;m going to start with this one : if momma ain &apos;t happy ,
> ain &apos;t nobody happy ."
>
> Should i have to elliminate this? or keep them?
>
> Thank you in advance for your replies
>
> Best
> --
> Cyrine NASRI
> Ph.D. Student in Computer Science
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to