Sorry for the slightly off-topic message, but at least it's about MT:

We're using the UN Chinese-English Parallel Text collection  
(LDC2004E12) for some of our work.  It has lots of odd sequences of  
the form:

   \x{a37e}

I presume these are hex codes indicating escaped characters or  
something, but I'm not sure what.  Has anyone done anything with  
these, other than ignore or delete them?

Thanks.

- John Burger
   MITRE
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to