On Thu, Mar 01, 2001 at 08:27:07PM +0400, Vlad Harchev wrote:
> On Thu, 1 Mar 2001, Karl Ove Hufthammer wrote:
> 
> > Saving in 'UTF-8' or 'UTF-16' *is* much better than using a other
> > charsets. Not because of AbiWord, but because *other* programs may be
> > reading AbiWord documents. People implementing XML parsers (which is used
> > by several programs, e.g. XSLT engines) don't want to implement hundreds
> > of character encodings, as this will 1) be much work, 2) increase the
> > size/bloat, and 3) be unnecessary.
> 
>  Hmm, if xml parser supports only utf8 or utf-16, it's broken. 

This is incorrect.  The XML Reccomendation requires merely that UTF-8
and UTF-16 be supported.  

> People should
> stick to libxml then. Also, a trivial sed script and iconv can be used to
> convert xml file in any encoding to valid xml file utf8 encoded.
> 

Well, since we don't intend to give up expat, we have the following
options:  

1. Continue current practice.  This discussion suggests that current
practice is broken for non-Latin1 encodings. 

2. Encode in UTF-8.  Vlad suggests that this is bad for single-byte
encodings that are not Latin1.  

3. Provide a way for Expat to handle other encodings.  This requires
using the expat functions for unknown encodings.  See the expat.h
header file for more info.  

1 is broken.  2 may be broken, but less so than 1.  3 requires
coding.  

Absent the coding required for 3, I think we should switch to 2.  Of
course, 3 is preferable.  

>  For non-latin1 languages, e.g. russian, conversion to utf8 doubles file in
> size, and makes it uneditable by plain editors. There are much more
> editors that don't support utf8 than xml parsers that don't support all
> encodings understood by iconv(3).
> 
>  So, PLEASE don't stick to utf8 for all locales.

           
        sam th               
        [EMAIL PROTECTED]
        http://www.abisource.com/~sam/
        GnuPG Key:  
        http://www.abisource.com/~sam/key

PGP signature

Reply via email to