Re: [GHC] #1079: refinement for GHC's support of UTF-8 encoding

GHC Tue, 03 Jul 2007 09:19:19 -0700

#1079: refinement for GHC's support of UTF-8 encoding
--------------------------------+-------------------------------------------
    Reporter:  [EMAIL PROTECTED]   |        Owner:         
        Type:  feature request  |       Status:  new    
    Priority:  normal           |    Milestone:  6.8    
   Component:  Compiler         |      Version:  6.6    
    Severity:  major            |   Resolution:         
    Keywords:                   |   Difficulty:  Unknown
          Os:  Unknown          |     Testcase:         
Architecture:  Unknown          |  
--------------------------------+-------------------------------------------
Changes (by Isaac Dupree):


  * cc:  [EMAIL PROTECTED] => [EMAIL PROTECTED],
         [EMAIL PROTECTED]

Old description:

> From 6.6, GHC supports UTF-8 encoding in the source programs.  GHC can
> read UTF-8 files and convert them into Unicode characters.  However,
> there are no support to read/print them.
>
> For example, we can compile the following program,
> {{{
> main = putStrLn "あ"
> }}}
> but we only get `B', the least 8bit of the character `あ' (U+3042).
> Because of this incompleteness, we cannot print any non-ascii characters
> without converting for the case of writing Haskell codes with UTF-8.
> Although it is easy to write converting functions for this purpose, such
> converting should be supported by the compiler.
>
> IMHO, desired approach is similar to Hugs.  In Hugs, when printing non-
> ascii characters, it first converts the characters to UTF-8 octets and
> then prints them.  However, with binary-mode Handle, it just print
> characters without convert.  This behavior will be acceptable for many
> haskell programmers.

New description:

 From 6.6, GHC supports UTF-8 encoding in the source programs.  GHC can
 read UTF-8 files and convert them into Unicode characters.  However, there
 are no support to read/print them.

 For example, we can compile the following program,
 {{{
 main = putStrLn "あ"
 }}}
 but we only get `B`, the least 8bit of the character `あ` (U+3042).
 Because of this incompleteness, we cannot print any non-ascii characters
 without converting for the case of writing Haskell codes with UTF-8.
 Although it is easy to write converting functions for this purpose, such
 converting should be supported by the compiler.

 IMHO, desired approach is similar to Hugs.  In Hugs, when printing non-
 ascii characters, it first converts the characters to UTF-8 octets and
 then prints them.  However, with binary-mode Handle, it just print
 characters without convert.  This behavior will be acceptable for many
 haskell programmers.

Comment:

 This reminds me of the case like "\213\23\231" ( = '\213' : '\23' : '\231'
 : [] according to Report) where GHC treated multiple of them as one
 Unicode character.  We should probably explicitly say somewhere: shape of
 String is UTF-32 (so that each Char the list contains is one Unicode code-
 point), and make that true for all the standard functions.

 Even if we assume the standard I/O uses UTF-8 (it has to, for ASCII
 compatibility), if String is in practice also used for binary data (is
 it?), the only compatible way might be to bring in a new I/O library as
 Bulat says.  For me, I would like Prelude input and output functions to
 use UTF-8 as the external format.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/1079>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Re: [GHC] #1079: refinement for GHC's support of UTF-8 encoding

Reply via email to