On Mon, Dec 15, 2008 at 12:19 AM, Caolán McNamara <[email protected]> wrote: > On Sat, 2008-12-13 at 05:04 +0900, Jeongkyu Kim wrote: >> However, there is one more homework left for me. When I opened the >> exported file with OO.o Writer, Korean characters were broken. I am >> playing around some functions in ww8par.cxx such as ReadPlainChars(), >> GetCurrentCharSet(), and Custom8BitToUnicode(), but I have no luck >> yet. Now, I need some hints on how to handle Korean characters >> correctly in importing filter > > Hmm, well first make sure that the in SwWW8ImplReader::ReadPlainChars > that eSrcCharSet is equal to RTL_TEXTENCODING_MS_949. > > Assuming that that is working correct then looking at the code it > probably does not properly handle multi-byte encodings. So rather than > sending each byte to be converted in > for( nL2 = 0; nL2 < nLen; ++nL2, ++pWork ) > it would likely be better to collect the whole set of bytes, adjust > Custom8BitToUnicode to take a sequence of bytes and send the whole lot > to rtl_convertTextToUnicode so as to not break up multi-bytes sequences > into broken single characters. If you have a simple same document which > reproduces this on import then if you log an issue and put me as "cmc" > on the cc I could take a look to see if that is the case. >
Thanks for your concern. I filed a issue for the problem. http://www.openoffice.org/issues/show_bug.cgi?id=97247 - MS Word 95 import filter does not handle DBCS correctly I will give an update through the issue if I have new findings. Best regards, Jeongkyu -- Jeongkyu Kim OpenOffice.org Korean community lead Community website http://openoffice.or.kr Personal blog http://openoffice.or.kr/gomme
