in order to use makemteval.py we need to remove 0D and E2 80 A8 from txt 
files.
python handles them as additional line breakers.

Le 12/09/2015 22:07, Vincent Nguyen a écrit :
> Hi,
>
> What script do you guys use to generate sgm sets based on txt file ?
>
> I have tried makemteval.py in contrib
> but there are a few issues.
>
> I think these lines:
> lines =
> [l.replace('&quot;','\"').replace('&apos;','\'').replace('&gt;','>').replace('&lt;','<').replace('&amp;','&')
> for l in filein.read().splitlines()]
> filein.close()
> lines =
> [l.replace('&','&amp;').replace('<','&lt;').replace('>','&gt;').replace('\'','&apos;').replace('\"','&quot;')
> for l in lines]
>
> are not 100% bullet proof.
>
> in the output I still get &apos; and such
> it does not handle the &nbsp;
> it does not handle the \r\n sequence I think since the output has more
> lines than in the txt file.
>
> Maybe there is another script.
>
> thanks.
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to