Vlad Harchev wrote:
> 
> On Tue, 22 May 2001, Hubert Figuiere wrote:
> 
> > According to Vlad Harchev <[EMAIL PROTECTED]>:
> > > On Tue, 22 May 2001, ha shao wrote:
> > >
> > > > On Tue, May 22, 2001 at 03:26:27AM +1000, [EMAIL PROTECTED] wrote:
> > > > > Vlad Harchev wrote:
> > > > >
> > > > > If I create an RTF with CJK characters in Word it looks like this:
> > > > >
> > > > > \f5\'82\'c9\'82\'d9\'82\'f1\'82\'b2
> > > > >
> > > > > This is what is not being imported correctly.
> > > > >
> > > > > >  And no, each byte of multibyte is not going through iconv, our code is:
> > > > > >                                       if (m_mbtowc.mbtowc(wc,(UT_Byte)ch))
> > > > > >                                               return AddChar(wc);
> > > > > >  it internally appends 'ch' to array-of-chars member of m_mbtowc, then 
>calls
> > > > > > iconv and check whether it was able to convert aggregated sequence. If it 
>was
> > > > > > able, then wchar is returned, otherwise 0 is returned (any already 
>aggregated
> > > > > > sequence isn't lost between calls).
> > > >
> > > > The problem is with the 'else' clause of the 'if'. When the
> > > > m_mbtowc.mbtowc() return 0, the 'else' reset the m_mbtowc.
> > > > Now we lost the internel buffer. Comment it out will bring
> > > > the cjk import to its old shape.
> > >
> > >  The code in AW 0.7.14 is
> > >                       if (no_convert==0 && ch<=0xff)
> > >                       {
> > >                               wchar_t wc;
> > >                               if (m_mbtowc.mbtowc(wc,(UT_Byte)ch))
> > >                                       return AddChar(wc);
> >                               else
> >                                       m_mbtowc.initialize();
> > >                       } else
> > >                               return AddChar(ch);
> > >
> > >  I don't see how 'else' can reset buffer (AddChar(ch) doesn't reset it
> > > either). Or did RTF importer change there since 0.7.14?
> >
> > It has changed. See above.
> > Committer is dom on May 3 (revision 1.58). Dom, can you explain what it is for ?
> 
>  Thank you for research. That added chunk:
>                                else
>                                        m_mbtowc.initialize();
>  should be definitely removed.

I'm not an iconv expert but I thought we needed something like
that to reset the internal state after trying and failing to
convert a character.  Something *like* that - but not exactly that...

Andrew.

-- 
http://linguaphile.sourceforge.net

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


Reply via email to