In
<cae1xxdgwscr+ffp13_rperg4jmkferdgp4f6sxtz7v48o4g...@mail.gmail.com>,
on 01/10/2014
   at 01:28 PM, John Gilmore <[email protected]> said:

>Briefly, effective rules for encoding any 'character' recognized as a
>Unicode one as a 'longer' UTF-8 one do not in general exist.

What are you drinking? RFC 3629 spells them out in excruciating
detail. 

>In dealing recently with a document containing mixed
>English, German, Korean and Japanese text I found that the UTF-8
>version was 23% longer than the UTF-16 version.

That simply an efficiency issue; "you need UTF-16" is a much strong
claim than "UTF-16 may be more efficient". Further, a sample size of
one is grossly inadequate for drawing statistical conclusions. Try
documents that are mostly English, French and German with a smattering
of CJK languages and you will get different results.
 
-- 
     Shmuel (Seymour J.) Metz, SysProg and JOAT
     ISO position; see <http://patriot.net/~shmuel/resume/brief.html> 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to