Hi,

What script do you guys use to generate sgm sets based on txt file ?

I have tried makemteval.py in contrib
but there are a few issues.

I think these lines:
lines = 
[l.replace('&quot;','\"').replace('&apos;','\'').replace('&gt;','>').replace('&lt;','<').replace('&amp;','&')
 
for l in filein.read().splitlines()]
filein.close()
lines = 
[l.replace('&','&amp;').replace('<','&lt;').replace('>','&gt;').replace('\'','&apos;').replace('\"','&quot;')
 
for l in lines]

are not 100% bullet proof.

in the output I still get &apos; and such
it does not handle the &nbsp;
it does not handle the \r\n sequence I think since the output has more 
lines than in the txt file.

Maybe there is another script.

thanks.



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to