On Wed, 1 Nov 2000, Belcon wrote:

 Hi Belcon, Hi abi-dev

> Hello:
> 
> Vlad Harchev ���
> 
> [deleted]
> > 
> >  Nice hack :)
> >  Also I would like you to try the latest CJK patch (it's all in one) that will
> > be announced shortly after this letter. It uses gdk_fontset_load instead of
> > gdk_font_load so your hack probably won't be needed. Could you test whether
> > it works?
> > 
> 
> There is still a bug when copy&paste.

 But does that patch fixes the problem of displaying Ch in GB* (without your
hack)? 

> [deleted]
> > 
> >  Definitely, adding {"CP950","GB2312" } is a right thing. So this should be
> > definiely there. So you can copy Ch to clipboard :)
> >  As for pasting, I've changed exporter to RTF a little too . Chances are low
> > that it help properly importing rtf exported by AW (i.e. copying and pasting)
> > but I see no other flaws in exporter/importer combination. So please try it
> > please.
> > 
> 
> I found that CP950 stands for Big5 encoding after I read some
> documents,while
> CP936 stands for GBK encoding. I don't know why
> wvLIDToCodePageConverter(0x404)
> return CP950 when I am using GB2312 encoding.:-(

 It's hardcoded into it wvLIDToCodePageConverter - that's why. :) 
 I'll fix this - I will add another lookup table that will map zh_CN.GB2312 to
 language code 0x804 and all will be fine.

 I'll post this patch in 8 hours only, sorry. As for now, the absence of this
patch doesn't hurt at all. (Except for making rtfs wit CJK unreadable by old
Words).

> >  If it doesn't work still - then save the rtf file and look at it with eyes.
> >  Also try saving exactly same text in GB2312 Word or other app.
> >  I recommend you to use some remarkable english phrases around say 2 chinese
> > characters (e.g. "AbiWord" )- this way you'll easily locate them with eyes.
> > Then compare the differences (and probably send me two pieces of text between
> > two english words - but try to limit it to 2 chinese characters so I will be
> > able to distinguish them too). Analyse. To test importing RTF (i.e. pasting)
> > just put a breakpoint at UT_Mbtowc::mbtowc and run importer or paste something
> > and watch whether things work as they should and fix them :)
> > 
> 
> Yes,I saved the rtf file and look at it with my eyes.When I tried to
> open this 
> rtf file,AW only show "??" for one Chinese Character.I think it is
> because AW
> can't find the proper glyph.
> I find that in AW 0.7.10,src/wp/impexp/xp/ie_exp_RTF.cpp:
> UT_Bool IE_Exp_RTF::_write_rtf_header(void),there is something changed
> when AW
> upgrade to 0.7.11.AW 0.7.10 works fine with rtf while AW 0.7.11 not.Here
> is the
> difference that I think is the reason why AW 0.7.11 can't show Chinese
> characters
> when we open a chinese rtf file.
> AW 0.7.10:
> UT_Bool IE_Exp_RTF::_write_rtf_header(void)
>     378 {
>     379         UT_uint32 k,kLimit;
>     380 
>     381         // write <rtf-header>
>     382         // return UT_FALSE on error
>     383 
>     384         _rtf_open_brace();
>     385         _rtf_keyword("rtf",1);                          // major
> version number of spec version 1.5
>     386 
>     387         _rtf_keyword("ansi");
> *** 388         _rtf_keyword("ansicpg",1252);           // TODO what
> CodePage do we want here ??
>     389 
>     390         _rtf_keyword("deff",0);                         //
> default font is index 0 aka black
>     391 
>     392         // write the "font table"....
>      [deleted]
> 
> AW 0.7.11
>     [deleted]
>     451         _rtf_keyword("ansi");
>     452         UT_Bool wrote_cpg = 0;
>     453         if (langcode)
>     454         {
>     455                 char* cpgname =
> wvLIDToCodePageConverter(langcode);
>     456                 if (UT_strnicmp(cpgname,"cp",2)==0 &&
> UT_UCS_isdigit(cpgname[2]))
>     457                 {
>     458                         int cpg;
>     459                         if (sscanf(cpgname+2,"%d",&cpg)==1)
>     460                         {
>     461                                 _rtf_keyword("ansicpg",cpg);
>     462                                 wrote_cpg = 1;
>     463                         }
>     464                 };
>     465         };
>     466         if (!wrote_cpg)
>     467             _rtf_keyword("ansicpg",1252);               // TODO
> what CodePage do we want here ??
>     468 
>     469         _rtf_keyword("deff",0);                         //
> default font is index 0 aka black
>     [deleted]
> Here is the rtf file generate by AW 0.7.10 and 0.7.11.(There are two
> chinese characters
> surrended by "AbiWord",Chinese character are same in GB2312)
> AW 0.7.10
> {\rtf1\ansi\ansicpg1252\deff0
> {\fonttbl
> {\f0\fnil\fcharset0\fprq0\fttruetype Times New Roman;}
> {\f1\fnil\fcharset0\fprq0\fttruetype ar pl sungtil gb;}}
> {\colortbl
> \red0\green0\blue0;}
> \kerning0\cf0\viewkind1\paperw12240\paperh15840\margl1440\margr1440\widowctl
> \sectd\sbknone\colsx360
> \pard{\f0 AbiWord}{\f1\uc0\u20320\uc0\u22909 AbiWord}}
> 
> AW 0.7.11
> {\rtf1\ansi\ansicpg950\deff0
> {\fonttbl
> {\f0\fnil\fcharset0\fprq0\fttruetype Times New Roman;}
> {\f1\fnil\fcharset0\fprq0\fttruetype ar pl sungtil gb;}}
> {\colortbl
> \red0\green0\blue0;}
> \kerning0\cf0\viewkind1\paperw12240\paperh15840\margl1440\margr1440\widowctl
> \sectd\sbknone\colsx360
> \pard{\f0 AbiWord}{\f1\'c4\'e3\'ba\'c3}{\f0 AbiWord}}
> 
> If I open the rtf file generated by AW 0.7.10 in AW 0.7.11,works fine.
> So,IMHO,I think there is something wrong with our RTF part.This also
> generate problem when we use copy&paste function.

 Thank you for attaching them and for your analysis.

 The old rtf  variant is "simple one" - it won't be understood by stupid
Wordpad and word6.0 and below. That's why I make producing RTFs in a wise way 
(that don't work for CJK yet :).
 I've attached a small patch that will return back "stupid" generation of RTFs
(as 0.7.10 did). Please try it.
 
> > [deleted]
> > > >
> > > >  Also try pasting from AW to AW. Does it work?
> > > >
> > > Many thanks to your help!
> > > Try pasting from AW to AW,it doesn't work if I haven't change CP936,
> > > after I changed CP936,it doesn't show Chinese characters.:-(
> > 
> >   As you've discovered, you have to add  {"CP950","GB2312"} to that table.
> 
> Still not work.
> Here is what I have found.Using copy&paste,take an example,I copy a
> Chinese 
> character,whose UCS2 code is 0x4f60 while GB2312 code is 0xc4e3,in a abw
> file,and paste it in same file.Then I saved and quit.I use "vi" to look
> at the truth in abw file.The original chinese character is "&#x4f60;".
> The pasted chinese character should also be "&#x4f60",while actually
> it is "&#xc4;&#xe3;".  This make me think that we forget to translate
> locale encoded characters to UCS2 encoded characters in copy and/or
> paste
> function.
> It is just my thought.And I found this suggestion did not match what I
> had 
> said before.:-( I am not familiar about AW,so,maybe need you to take a
> look.

 Thank you for very clean analysis - that's what I've expected from you :) 
 OK, please apply the patch to exporter first. And report results. That
problem with importer won't arise (that code simply won't be engaged).
 
> > >
> 
> > 
> >  I didn't cc to Martin and Sam and HJ this time. If you guys want to be
> > "subscribed" make us know :)
> > 
> 
> I have been subscribed and I can receive the mails from mail-listing,but 
> it seemed that I can't send my emails to this mail-listing.Maybe there 
> is something in server which filter my emails. :-(

 Very strange :(

> >  PS: I will announce next incremental next-cjk-patch.diff that will include
> >  all the 2nd version of next-cjk-patch.diff had plus some changes made on
> >  your research (fix in ev_UnixKeyboard.cpp, fix for exporter to .abw and
> > addition of  {"CP950","GB2312"} to the table of encodings. So please try it.
> > 
> >  At a minimum, try changes to xap_UnixFont.cpp and ie_exp_RTF*.cpp - you
> > didn't try them yet.
> > 
> >  Best regards,
> >   -Vlad
> 

 I'm very busy today. I will be able to read my mail only 5 hours later,
sorry.
 So please try all variants.

 Best regards,
  -Vlad
--- ie_exp_RTF_listenerWriteDoc.cpp-was Wed Nov  1 10:13:10 2000
+++ ie_exp_RTF_listenerWriteDoc.cpp     Wed Nov  1 10:17:48 2000
@@ -194,6 +194,7 @@
                default:
                        if (XAP_EncodingManager::instance->cjk_locale())
                        {
+#if 0                  
                                /*FIXME: can it happen that wctomb will fail under CJK 
locales? */
                                m_wctomb.wctomb_or_fallback(mbbuf,mblen,*pData++);
                                for(int i=0;i<mblen;++i) {
@@ -204,6 +205,17 @@
                                                *pBuf++ = c;
                                        
                                };
+#else
+                               UT_UCSChar c = *pData++;
+                               if (c>0x007f) {
+                                       m_pie->_rtf_keyword("uc",0);
+                                       signed short si = *((signed short *)c);        
+ // so we need to write negative
+                                       m_pie->_rtf_keyword("u",si);                   
+                 // numbers for large unicode values.
+                                                                               
+                               } else {
+                                       *pBuf++ = (UT_Byte)c;
+                               };
+#endif                         
                        } else if (!m_pie->m_atticFormat) 
                        {
                                if (*pData > 0x00ff)            // emit unicode 
character

Reply via email to