[Evolution-hackers] Re: charset foo

Dan Winship Thu, 02 May 2002 07:16:36 -0700

[moving from evolution-patches to evolution-hackers]

Giving the user a choice could work. We can't *just* autodetect based on
the UTF8. In a string like "The character for the word 'one' is
<<U+4E00>>", the last character could be Japanese, Simplified Chinese,
or Traditional Chinese (or even Korean sometimes?).


Is there any way for the composer to know whether the user is using a
Japanese or Chinese input method? (And are there separate traditional
and simplified chinese input methods?)

And what about cut+paste? If you paste characters from a Big5 web page,
does the composer know that or does it only get UTF8?

-- Dan

On Wed, 2002-05-01 at 21:28, Not Zed wrote:
> Yes we need this code, as we needed it when it was written.
> 
> If nothing else, we could potentially use it to offer the user a choice
> (as emacs does), or use it to determine if the users locale charset is a
> valid option, or even for things like autodetecting unknown data (using
> locale as a hint).
> 
> The code is priority based at least.  So you just order the super-meta
> charsets last, so they wont be chosen for normal text, and maybe even
> special case them based on locale so utf8 is usually preffered.
> 
> On Wed, 2002-05-01 at 21:42, Dan Winship wrote:
> > > Order of preference seems to be iso-2022-jp, Shift-JIS, and then euc-jp
> > > but neither Shift-JIS nor euc-jp are liked very much. They seem to only
> > > be common in the US for example.
> > >
> > > Korean users tend to prefer euc-kr over iso-2022-kr.
> > 
> > Do the character sets actually contain vastly different data? Will 
> > Shift-JIS, euc-jp, or iso-2022-kr ever get chosen?
> > For that matter, will the Chinese charsets ever get autodetected or will 
> > it always use the Japanese ones instead (at least for messages 
> > containing only reasonably common characters)?
> > 
> > Also, does this patch address the issue that a message containing both 
> > Greek and Russian *can* be encoded in iso-2022, but *should* be encoded 
> > in UTF8?
> > 
> > What problem exactly is this supposed to be solving? If you want to 
> > autodetect Asian charsets for people who aren't replying to an 
> > Asian-language message and don't have an Asian locale, I don't think 
> > this will work.
> > 
> > Heuristics that might work are "if it contains Korean characters (which 
> > are all in a certain range in Unicode), try EUC-KR", "if it contains 
> > Japanese hiragana/katakana (likewise), try iso-2022-jp", and "if it 
> > contains unihan characters but not kana, it's probably Chinese". I don't 
> > think you can autoselect between traditional and simplified Chinese 
> > charsets based on a UTF8 input stream though.
> > 
> > -- Dan
> > 
> > 
> > _______________________________________________
> > Evolution-patches maillist  -  [EMAIL PROTECTED]
> > http://lists.ximian.com/mailman/listinfo/evolution-patches
> 


_______________________________________________
evolution-hackers maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/evolution-hackers

[Evolution-hackers] Re: charset foo

Reply via email to