in order to use makemteval.py we need to remove 0D and E2 80 A8 from txt
files.
python handles them as additional line breakers.
Le 12/09/2015 22:07, Vincent Nguyen a écrit :
> Hi,
>
> What script do you guys use to generate sgm sets based on txt file ?
>
> I have tried makemteval.py in contrib
> but there are a few issues.
>
> I think these lines:
> lines =
> [l.replace('"','\"').replace(''','\'').replace('>','>').replace('<','<').replace('&','&')
> for l in filein.read().splitlines()]
> filein.close()
> lines =
> [l.replace('&','&').replace('<','<').replace('>','>').replace('\'',''').replace('\"','"')
> for l in lines]
>
> are not 100% bullet proof.
>
> in the output I still get ' and such
> it does not handle the
> it does not handle the \r\n sequence I think since the output has more
> lines than in the txt file.
>
> Maybe there is another script.
>
> thanks.
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support