Re: [Evolution-hackers] Re: charset foo

Not Zed Thu, 02 May 2002 23:18:03 -0700


On Fri, 2002-05-03 at 00:01, Dan Winship wrote:
> [moving from evolution-patches to evolution-hackers]
> 
> Giving the user a choice could work. We can't *just* autodetect based on
> the UTF8. In a string like "The character for the word 'one' is
> <<U+4E00>>", the last character could be Japanese, Simplified Chinese,
> or Traditional Chinese (or even Korean sometimes?).


Duh, yeah no shit.  The letter 'a' can be in just about everything too.

> Is there any way for the composer to know whether the user is using a
> Japanese or Chinese input method? (And are there separate traditional
> and simplified chinese input methods?)
> 
> And what about cut+paste? If you paste characters from a Big5 web page,
> does the composer know that or does it only get UTF8?

You use utf8.

> -- Dan
> 
> On Wed, 2002-05-01 at 21:28, Not Zed wrote:
> > Yes we need this code, as we needed it when it was written.
> > 
> > If nothing else, we could potentially use it to offer the user a choice
> > (as emacs does), or use it to determine if the users locale charset is a
> > valid option, or even for things like autodetecting unknown data (using
> > locale as a hint).
> > 
> > The code is priority based at least.  So you just order the super-meta
> > charsets last, so they wont be chosen for normal text, and maybe even
> > special case them based on locale so utf8 is usually preffered.
> > 
> > On Wed, 2002-05-01 at 21:42, Dan Winship wrote:
> > > > Order of preference seems to be iso-2022-jp, Shift-JIS, and then euc-jp
> > > > but neither Shift-JIS nor euc-jp are liked very much. They seem to only
> > > > be common in the US for example.
> > > >
> > > > Korean users tend to prefer euc-kr over iso-2022-kr.
> > > 
> > > Do the character sets actually contain vastly different data? Will 
> > > Shift-JIS, euc-jp, or iso-2022-kr ever get chosen?
> > > For that matter, will the Chinese charsets ever get autodetected or will 
> > > it always use the Japanese ones instead (at least for messages 
> > > containing only reasonably common characters)?
> > > 
> > > Also, does this patch address the issue that a message containing both 
> > > Greek and Russian *can* be encoded in iso-2022, but *should* be encoded 
> > > in UTF8?
> > > 
> > > What problem exactly is this supposed to be solving? If you want to 
> > > autodetect Asian charsets for people who aren't replying to an 
> > > Asian-language message and don't have an Asian locale, I don't think 
> > > this will work.
> > > 
> > > Heuristics that might work are "if it contains Korean characters (which 
> > > are all in a certain range in Unicode), try EUC-KR", "if it contains 
> > > Japanese hiragana/katakana (likewise), try iso-2022-jp", and "if it 
> > > contains unihan characters but not kana, it's probably Chinese". I don't 
> > > think you can autoselect between traditional and simplified Chinese 
> > > charsets based on a UTF8 input stream though.
> > > 
> > > -- Dan
> > > 
> > > 
> > > _______________________________________________
> > > Evolution-patches maillist  -  [EMAIL PROTECTED]
> > > http://lists.ximian.com/mailman/listinfo/evolution-patches
> > 
> 
> 
> _______________________________________________
> evolution-hackers maillist  -  [EMAIL PROTECTED]
> http://lists.ximian.com/mailman/listinfo/evolution-hackers


_______________________________________________
evolution-hackers maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/evolution-hackers

Re: [Evolution-hackers] Re: charset foo

Reply via email to