Re: [sw-discussion] What is the best way to prevent data loss by choosing filters which does not support unicode?

Jeongkyu Kim Sun, 14 Dec 2008 09:20:43 -0800

On Mon, Dec 15, 2008 at 12:19 AM, Caolán McNamara <[email protected]> wrote:
> On Sat, 2008-12-13 at 05:04 +0900, Jeongkyu Kim wrote:
>> However, there is one more homework left for me. When I opened the
>> exported file with OO.o Writer, Korean characters were broken. I am
>> playing around some functions in ww8par.cxx such as ReadPlainChars(),
>> GetCurrentCharSet(), and Custom8BitToUnicode(), but I have no luck
>> yet. Now, I need some hints on how to handle Korean characters
>> correctly in importing filter
>
> Hmm, well first make sure that the in SwWW8ImplReader::ReadPlainChars
> that eSrcCharSet is equal to RTL_TEXTENCODING_MS_949.
>
> Assuming that that is working correct then looking at the code it
> probably does not properly handle multi-byte encodings. So rather than
> sending each byte to be converted in
> for( nL2 = 0; nL2 < nLen; ++nL2, ++pWork )
> it would likely be better to collect the whole set of bytes, adjust
> Custom8BitToUnicode to take a sequence of bytes and send the whole lot
> to rtl_convertTextToUnicode so as to not break up multi-bytes sequences
> into broken single characters. If you have a simple same document which
> reproduces this on import then if you log an issue and put me as "cmc"
> on the cc I could take a look to see if that is the case.
>


Thanks for your concern. I filed a issue for the problem.

http://www.openoffice.org/issues/show_bug.cgi?id=97247
- MS Word 95 import filter does not handle DBCS correctly

I will give an update through the issue if I have new findings.

Best regards,
Jeongkyu
-- 
Jeongkyu Kim
OpenOffice.org Korean community lead

Community website http://openoffice.or.kr
Personal blog     http://openoffice.or.kr/gomme

Re: [sw-discussion] What is the best way to prevent data loss by choosing filters which does not support unicode?

Reply via email to