----- Original Message ----- ������: Paul Rohr <[EMAIL PROTECTED]> �ռ���: hj <[EMAIL PROTECTED]>; patches <[EMAIL PROTECTED]>; abiword-dev <[EMAIL PROTECTED]> ����ʱ��: 2000��3��31�� 9:46 ����: Re: AbiWord Chinese version of Linux > At 10:30 AM 3/27/00 +0800, hj wrote: > > Top level window not support XIM. But s_ic and s_ic_attr must be static > >member. It will cause segment fault if I change to non-static. I don't know > >why. > > All Chinese and English Characters are encoded in unicode in abw. > >European languages are not encoded in unicode. In furture we display > >different languages in one document. So unicode encoding is needed.If you > >replace fonts.hj with european languages, Characters are unicode in abw. > > Chinese font files are too large to ship. I don't distribute Chinese > >fonts. I create a file "fonts.hj" in AbiWord font file that include Chinese > >printing font name, XLFD, printing font ascent, printing font descent and > >printing font width. > > All unixfonts are created as fontset not font. It can display both > >English and Chinese Character. Printing program can print both English and > >Chinese Character. > > We must resolve that keyval will be 0xffffff when I input Chinese with > >XIM. Chinese strings are stored in string not in keyval. > > Thanks for the patch. I'm very very impressed at how you've tackled issues > throughout the tree to get Chinese working for you on Linux. My goal now is > to figure out how to integrate the work you've done with the work that will > be needed to add true Unicode support for other languages and/or platforms. > > At this point, I'd like feedback from other developers in the following two > areas: > > - people working on related i18n issues (Henrik Berg, Vadim Frolov) > - a random GTK expert or two > > As soon as we've got some consensus that you all are heading in the same > direction, we can start getting some or all of this code checked in. > > To get the discussion rolling, here are some observations (in no particular > order): > > 0. do you have a screen shot? > ------------------------------ > I'd totally love to *see* your version running. > > 1. UI translation > ------------------ > It's really cool to see that you've already translated most of the UI. I'm > presuming that the hex-encoded characters map directly to the appropriate > Unicode characters, and not some other charset, right? Chinese characters are MB in ap_Menu_LabelSet_ZhCN.h and ap_TB_LabelSet_ZhCN.h. Chinese characters are unicode in ZhCN.strings. > > src/wp/ap/xp/ap_Menu_LabelSet_Languages.h > src/wp/ap/xp/ap_Menu_LabelSet_ZhCN.h > src/wp/ap/xp/ap_TB_LabelSet_Languages.h > src/wp/ap/xp/ap_TB_LabelSet_ZhCN.h > user/wp/strings/ZhCN.strings > > How bad was it to do all the editing to generate an 8859-1 encoding of the > strings file? Would it have been easier for you to use one of expat's other > supported encodings instead? > > http://www.jclark.com/xml/expatfaq.html > > For example, you can directly export UTF8 files from AbiWord. :-) > > 2. XIM on frame > ---------------- > Thanks for digging out the GTK apis for XIM support. Is there anything we'd > need to know to make these changes work for other languages besides Chinese? XIM supports all other languages. > > src/af/xap/unix/xap_UnixFrame.cpp > src/af/xap/unix/xap_UnixFrame.h > > Also, could you elaborate on what problems you were seeing with non-static > ICs? Just segment fault. There's no difference between static or non-static if only invoke frame one time. > Perhaps someone else on the list might be able to help. > > 3. coding style > ---------------- > It looks like there are a number of places where you added files and/or > functions, all of which had your initials as a prefix. Do you want your > code to stand out like this, or was that just to make it easier to read the > patch? > > (We generally tend to try to write code so it all blends in together. That > way, you have to use Bonsai's cvsblame tool to see who was responsible for a > given line of code.) > > 4. files to ignore > ------------------- > I noticed that there were a bunch of files in your patch which included > changes which probably shouldn't be checked in. For example, > > src/af/xap/Makefile > src/af/xap/unix/xap_UnixDlg_About.cpp > > In addition, a bunch of spurious diffs were generated by RCS_ID variations. > (Does anyone know of an option to suppress these?) > > 5. some languages don't ever get spell-checked > ----------------------------------------------- > I also noticed that you've implemented quick hacks to avoid spell-checking > chinese content. > > src/text/fmt/xp/fl_BlockLayout.cpp > src/wp/ap/xp/ap_Dialog_Spell.cpp > > Is there a more general way to do this check? Do we want to explicitly tag > content by language (via the lang attribute), or will it be enough to just > ignore certain Unicode ranges? We should tag content by language. > > 6. pairing unrelated fonts > --------------------------- > This one's going to sound pretty ignorant, so please forgive me. > > I'm not sure I completely understand why you've implemented the logic to > pair up English and Chinese fonts as if they were the same font (as far as > the UI is concerned). > > src/af/xap/unix/xap_UnixFont.cpp > src/af/xap/unix/xap_UnixFont.h > src/af/xap/unix/xap_UnixFontManager.cpp > src/af/xap/unix/xap_UnixFontManager.h > src/af/xap/unix/xap_UnixPSGraphics.cpp > src/af/xap/unix/xap_UnixPSGraphics.h > > I'm used to using WYSIWYG editors, where users choose to use one font at a > time, switching to others as needed. Any time you use a character which > isn't provided in that font, you get a slug character. I do it just to not display slug characters. I think AbiWord should know which language the unicoded character is in the future. And it could select font automatically. It cann't affect Chinese character if I select Times New Roman because of Times New Roman is English font. Printing are also. > > From what little I know of fontsets, the idea is that you explicitly > assemble a collection of overlapping fonts and give that *set* of fonts a > name. IIRC, GTK has mechanisms to do this, but I'm not sure whether that > helps you much, since you have to generate PS output, too. > > (It's bad enough to do a 1-to-1 WYSIWYG mapping between screen fonts and > printer fonts. Mapping collections of fontsets sounds like a nightmare.) > > Again, my goal here is to understand how to take what you've done and use it > to solve similar problems for other languages. > > 7. multibyte / wide character conversions > ------------------------------------------ > I suspect that this stuff is likely to be the most controversial. There are > a number of places in the code where you've introduced locale-specific > variants of UCS <--> char conversions via mbtowc() and wctomb(). > > mbtowc > ------ > src/af/ev/unix/ev_UnixKeyboard.cpp > > wctomb > ------ > src/af/gr/unix/gr_UnixGraphics.cpp > > UCS <--> char (via wc/mb) > ------------- > src/af/util/Makefile > src/af/util/xp/Makefile > src/af/util/xp/hj.cpp > src/af/util/xp/hj.h > src/af/util/xp/hj_mbtowc.cpp > src/af/util/xp/hj_mbtowc.h > src/af/util/xp/hj_wctomb.cpp > src/af/util/xp/hj_wctomb.h > > src/text/fmt/xp/fp_TextRun.cpp > src/wp/ap/unix/ap_UnixDialog_Replace.cpp > src/wp/ap/xp/ap_EditMethods.cpp > > To be honest, I'm not sure how this approach compares to the iconv-oriented > stuff which Henrik and Vadim have been working on. I'm sure you're each > working on real problems, but I frankly don't understand enough about what > any of you are doing to be able to judge the merits of each approach. > > Could the three of you start a discussion to help get ignorant Americans > like me up to speed? ;-) mbtowc() and wctomb() are same as iconv. But mbtowc and wctomb just support native language MB character <--> unicode. iconv can do other language MB character<--> unicode. > > 8. should plain text be anything other than ASCII? > --------------------------------------------------- > On a similar note, it looks like you've extended a bunch of logic which > currently reads Latin-1 files to also handle other encodings, albeit in a > locale-specific way. > > src/af/xap/xp/xap_Strings.cpp > src/wp/ap/xp/ap_Strings.cpp > src/wp/impexp/xp/ie_exp_Text.cpp > src/wp/impexp/xp/ie_imp_MsWord_97.cpp > src/wp/impexp/xp/ie_imp_Text.cpp > MB characters are in text file. Text file denpends on locale. All others are unicoded so that they are portable files. > This makes me kind of nervous, because it means that the actual contents of > the files being read and written are interpreted as being in different > charsets, depending on your locale settings at runtime. > > Up until now, we've been striving to create totally-portable files, which > are always in the same encoding no matter where you read or write them. > (Thus, for example, note how we've differentiated 7-bit text files from UTF8 > text files.) > > bottom line > ----------- > You've obviously put a lot of hard work into this patch, and I really really > want to be able to start bragging about the fact that we support Chinese on > at least one platform. That's *so* cool! > > To be honest, I'm not sure that all of the issues I've mentioned above are > actually real. However, at the moment, I don't know enough to be able to > decide how much of this patch to integrate into the tree. XIM support can add into the tree. Others will be wait. > > Could the various folks working on i18n issues help clear up some of my > confusion here? > > Thanks, > Paul
