Re: AbiWord Chinese version of Linux

hj Sat, 1 Apr 2000 04:05:37 -0600 (CST)

----- Original Message -----
������: Paul Rohr <[EMAIL PROTECTED]>
�ռ���: hj <[EMAIL PROTECTED]>; patches <[EMAIL PROTECTED]>; abiword-dev
<[EMAIL PROTECTED]>
����ʱ��: 2000��3��31�� 9:46
����: Re: AbiWord Chinese version of Linux


> At 10:30 AM 3/27/00 +0800, hj wrote:
> >    Top level window not support XIM. But s_ic and s_ic_attr must be
static
> >member. It will cause segment fault if I change to non-static. I don't
know
> >why.
> >    All Chinese and English Characters are encoded in unicode in abw.
> >European languages are not encoded in unicode. In furture we display
> >different languages in one document. So unicode encoding is needed.If you
> >replace fonts.hj with european languages, Characters are unicode in abw.
> >    Chinese font files are too large to ship. I don't distribute Chinese
> >fonts. I create a file "fonts.hj" in AbiWord font file that include
Chinese
> >printing font name, XLFD, printing font ascent, printing font descent and
> >printing font width.
> >    All unixfonts are created as fontset not font. It can display both
> >English and Chinese Character. Printing program can print both English
and
> >Chinese Character.
> >    We must resolve that keyval will be 0xffffff when I input Chinese
with
> >XIM. Chinese strings are stored in string not in keyval.
>
> Thanks for the patch.  I'm very very impressed at how you've tackled
issues
> throughout the tree to get Chinese working for you on Linux.  My goal now
is
> to figure out how to integrate the work you've done with the work that
will
> be needed to add true Unicode support for other languages and/or
platforms.
>
> At this point, I'd like feedback from other developers in the following
two
> areas:
>
>   - people working on related i18n issues (Henrik Berg, Vadim Frolov)
>   - a random GTK expert or two
>
> As soon as we've got some consensus that you all are heading in the same
> direction, we can start getting some or all of this code checked in.
>
> To get the discussion rolling, here are some observations (in no
particular
> order):
>
> 0.  do you have a screen shot?
> ------------------------------
> I'd totally love to *see* your version running.
>
> 1.  UI translation
> ------------------
> It's really cool to see that you've already translated most of the UI.
I'm
> presuming that the hex-encoded characters map directly to the appropriate
> Unicode characters, and not some other charset, right?

Chinese characters are MB in ap_Menu_LabelSet_ZhCN.h and
ap_TB_LabelSet_ZhCN.h.
Chinese characters are unicode in ZhCN.strings.

>
>   src/wp/ap/xp/ap_Menu_LabelSet_Languages.h
>   src/wp/ap/xp/ap_Menu_LabelSet_ZhCN.h
>   src/wp/ap/xp/ap_TB_LabelSet_Languages.h
>   src/wp/ap/xp/ap_TB_LabelSet_ZhCN.h
>   user/wp/strings/ZhCN.strings
>
> How bad was it to do all the editing to generate an 8859-1 encoding of the
> strings file?  Would it have been easier for you to use one of expat's
other
> supported encodings instead?
>
>   http://www.jclark.com/xml/expatfaq.html
>
> For example, you can directly export UTF8 files from AbiWord.  :-)
>
> 2.  XIM on frame
> ----------------
> Thanks for digging out the GTK apis for XIM support.  Is there anything
we'd
> need to know to make these changes work for other languages besides
Chinese?

XIM supports all other languages.

>
>   src/af/xap/unix/xap_UnixFrame.cpp
>   src/af/xap/unix/xap_UnixFrame.h
>
> Also, could you elaborate on what problems you were seeing with non-static
> ICs?

Just segment fault. There's no difference between static or non-static if
only invoke frame one time.

> Perhaps someone else on the list might be able to help.
>
> 3.  coding style
> ----------------
> It looks like there are a number of places where you added files and/or
> functions, all of which had your initials as a prefix.  Do you want your
> code to stand out like this, or was that just to make it easier to read
the
> patch?
>
> (We generally tend to try to write code so it all blends in together.
That
> way, you have to use Bonsai's cvsblame tool to see who was responsible for
a
> given line of code.)
>
> 4.  files to ignore
> -------------------
> I noticed that there were a bunch of files in your patch which included
> changes which probably shouldn't be checked in.  For example,
>
>   src/af/xap/Makefile
>   src/af/xap/unix/xap_UnixDlg_About.cpp
>
> In addition, a bunch of spurious diffs were generated by RCS_ID
variations.
> (Does anyone know of an option to suppress these?)
>
> 5.  some languages don't ever get spell-checked
> -----------------------------------------------
> I also noticed that you've implemented quick hacks to avoid spell-checking
> chinese content.
>
>   src/text/fmt/xp/fl_BlockLayout.cpp
>   src/wp/ap/xp/ap_Dialog_Spell.cpp
>
> Is there a more general way to do this check?  Do we want to explicitly
tag
> content by language (via the lang attribute), or will it be enough to just
> ignore certain Unicode ranges?

We should tag content by language.

>
> 6.  pairing unrelated fonts
> ---------------------------
> This one's going to sound pretty ignorant, so please forgive me.
>
> I'm not sure I completely understand why you've implemented the logic to
> pair up English and Chinese fonts as if they were the same font (as far as
> the UI is concerned).
>
>   src/af/xap/unix/xap_UnixFont.cpp
>   src/af/xap/unix/xap_UnixFont.h
>   src/af/xap/unix/xap_UnixFontManager.cpp
>   src/af/xap/unix/xap_UnixFontManager.h
>   src/af/xap/unix/xap_UnixPSGraphics.cpp
>   src/af/xap/unix/xap_UnixPSGraphics.h
>
> I'm used to using WYSIWYG editors, where users choose to use one font at a
> time, switching to others as needed.  Any time you use a character which
> isn't provided in that font, you get a slug character.

I do it just to not display slug characters. I think AbiWord should know
which language the unicoded character is in the future. And it could select
font automatically. It cann't affect Chinese character if I select Times New
Roman because of Times New Roman is English font. Printing are also.

>
> From what little I know of fontsets, the idea is that you explicitly
> assemble a collection of overlapping fonts and give that *set* of fonts a
> name.  IIRC, GTK has mechanisms to do this, but I'm not sure whether that
> helps you much, since you have to generate PS output, too.
>
> (It's bad enough to do a 1-to-1 WYSIWYG mapping between screen fonts and
> printer fonts.  Mapping collections of fontsets sounds like a nightmare.)
>
> Again, my goal here is to understand how to take what you've done and use
it
> to solve similar problems for other languages.
>

> 7.  multibyte / wide character conversions
> ------------------------------------------
> I suspect that this stuff is likely to be the most controversial.  There
are
> a number of places in the code where you've introduced locale-specific
> variants of UCS <--> char conversions via mbtowc() and wctomb().
>
>   mbtowc
>   ------
>   src/af/ev/unix/ev_UnixKeyboard.cpp
>
>   wctomb
>   ------
>   src/af/gr/unix/gr_UnixGraphics.cpp
>
>   UCS <--> char (via wc/mb)
>   -------------
>   src/af/util/Makefile
>   src/af/util/xp/Makefile
>   src/af/util/xp/hj.cpp
>   src/af/util/xp/hj.h
>   src/af/util/xp/hj_mbtowc.cpp
>   src/af/util/xp/hj_mbtowc.h
>   src/af/util/xp/hj_wctomb.cpp
>   src/af/util/xp/hj_wctomb.h
>
>   src/text/fmt/xp/fp_TextRun.cpp
>   src/wp/ap/unix/ap_UnixDialog_Replace.cpp
>   src/wp/ap/xp/ap_EditMethods.cpp
>
> To be honest, I'm not sure how this approach compares to the
iconv-oriented
> stuff which Henrik and Vadim have been working on.  I'm sure you're each
> working on real problems, but I frankly don't understand enough about what
> any of you are doing to be able to judge the merits of each approach.
>
> Could the three of you start a discussion to help get ignorant Americans
> like me up to speed?  ;-)

mbtowc() and wctomb() are same as iconv. But mbtowc and wctomb just support
native language MB character <--> unicode. iconv can do other language MB
character<--> unicode.

>
> 8.  should plain text be anything other than ASCII?
> ---------------------------------------------------
> On a similar note, it looks like you've extended a bunch of logic which
> currently reads Latin-1 files to also handle other encodings, albeit in a
> locale-specific way.
>
>   src/af/xap/xp/xap_Strings.cpp
>   src/wp/ap/xp/ap_Strings.cpp
>   src/wp/impexp/xp/ie_exp_Text.cpp
>   src/wp/impexp/xp/ie_imp_MsWord_97.cpp
>   src/wp/impexp/xp/ie_imp_Text.cpp
>

MB characters are in text file.  Text file denpends on locale. All others
are unicoded so that they are portable files.

> This makes me kind of nervous, because it means that the actual contents
of
> the files being read and written are interpreted as being in different
> charsets, depending on your locale settings at runtime.
>
> Up until now, we've been striving to create totally-portable files, which
> are always in the same encoding no matter where you read or write them.
> (Thus, for example, note how we've differentiated 7-bit text files from
UTF8
> text files.)
>
> bottom line
> -----------
> You've obviously put a lot of hard work into this patch, and I really
really
> want to be able to start bragging about the fact that we support Chinese
on
> at least one platform.  That's *so* cool!
>
> To be honest, I'm not sure that all of the issues I've mentioned above are
> actually real.  However, at the moment, I don't know enough to be able to
> decide how much of this patch to integrate into the tree.

XIM support can add into the tree. Others will be wait.

>
> Could the various folks working on i18n issues help clear up some of my
> confusion here?
>
> Thanks,
> Paul
Re: AbiWord Chinese version of Linux

Reply via email to