Sorry for the slightly off-topic message, but at least it's about MT:
We're using the UN Chinese-English Parallel Text collection
(LDC2004E12) for some of our work. It has lots of odd sequences of
the form:
\x{a37e}
I presume these are hex codes indicating escaped characters or
something, but I'm not sure what. Has anyone done anything with
these, other than ignore or delete them?
Thanks.
- John Burger
MITRE
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support