Hi, Here is the webrev for the updated MS936.map change, which updated the mapping entries for 500+ EUDC code points with in range of A140- A7A0. I'm using CR#6183404
http://cr.openjdk.java.net/~sherman/6183404/webrev I re-generated the MS936.b2c and c2b mapping tables via MultiByteToWideChar and WideCharToMultiByte as showed in ms936.c below. http://cr.openjdk.java.net/~sherman/6183404/ms936.c I went through the diff of the newly generated b2c table and the existing MS936.map, it appears the two tables are identical except the 500+ code points of EUDC(PUA) with in range 0xA140-0xA7A0. You can check the "defined" and "undefined" ms936 code points at http://msdn.microsoft.com/en-US/goglobal/cc305153 (click the A1 - A7) The mapping from f...@jp.ibm.com (integrated into JDK1.3/1999 via CR#4202893) fills all "user-defined"/undefined code points in this range ( 0xA140 - 0xA7A0) with the code points from Unicode PUA starting from U-E4C6 to U-E79F one by one sequentially (in code point order). However the newly generated mapping table from MultiByteToWideChar and WideCharToMultiByte suggests the actually mapping is to fill the big continuing area first with code points starting from U+E4C6 (sequentially) 0xA140-A1A0 -> U+E4C6 - U+E525 0xA240-A2A0 -> U+E526 - U+E585 0xA340-A3A0 -> U+E586 - U+E5E5 ... 0xA740-A7A0 -> U+E706 - U+E765 then it goes back to fill those "small"/leftover area/spot with the PUA code points started from U+E766, the first is 0xA2AB -> U+E766 ... 0xA6FE -> U+E79F This pattern can be easily observed at http://cr.openjdk.java.net/~sherman/6183404/webrev/make/tools/CharsetMapping/MS936.map.sdiff.html Now the new MS936.map is identical to the mapping used by wctomb and mbtowc, the only exception is the 0xff <-> u+F8F5, which is excluded for now, personally I don't feel comfortable it in. #6183404 also complains some 412 non-UDC characters missing from Java MS936, all these characters are listed at http://cr.openjdk.java.net/~sherman/6183404/CodePage936.pdf A careful check suggested these are the result of incorrect use of WideCharToMultiByte when generating the mapping, it appears these entries are "best fit" result from WideCharToMultiByte when WC_NO_BEST_FIT_CHARS flag is not specified. There might be a compatibility concern of changing these entries, but given (1) they are educ/pua characters/code points (2)it follows MS, and this is a MS charset, I don't think this should stop the update. OK, this is all I got. Please help review (Masoyoshi, Charles) Thanks, -Sherman