Re: [sw-discussion] What is the best way to prevent data loss by choosing filters which does not support unicode?

Caolán McNamara Sun, 14 Dec 2008 07:19:55 -0800

On Sat, 2008-12-13 at 05:04 +0900, Jeongkyu Kim wrote:
> However, there is one more homework left for me. When I opened the
> exported file with OO.o Writer, Korean characters were broken. I am
> playing around some functions in ww8par.cxx such as ReadPlainChars(),
> GetCurrentCharSet(), and Custom8BitToUnicode(), but I have no luck
> yet. Now, I need some hints on how to handle Korean characters
> correctly in importing filter


Hmm, well first make sure that the in SwWW8ImplReader::ReadPlainChars
that eSrcCharSet is equal to RTL_TEXTENCODING_MS_949.  

Assuming that that is working correct then looking at the code it
probably does not properly handle multi-byte encodings. So rather than
sending each byte to be converted in 
for( nL2 = 0; nL2 < nLen; ++nL2, ++pWork )
it would likely be better to collect the whole set of bytes, adjust
Custom8BitToUnicode to take a sequence of bytes and send the whole lot
to rtl_convertTextToUnicode so as to not break up multi-bytes sequences
into broken single characters. If you have a simple same document which
reproduces this on import then if you log an issue and put me as "cmc"
on the cc I could take a look to see if that is the case.

C.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [sw-discussion] What is the best way to prevent data loss by choosing filters which does not support unicode?

Reply via email to