Hi Huizhe, I have a strange contribution to share.
Webrev: http://cr.openjdk.java.net/~martin/webrevs/openjdk9/XMLScanner-supplementary-characters/ But I don't have a bug or a test case that I can share and I don't understand the code, except that it's a fix for supporting supplementary characters. Perhaps you can reverse engineer the bug fix to a bug report? Also, maybe this is fixed in "real" xerces? I have no idea... At google we've been carrying this patch around for a while. Martin