#3341: encoding errors could be handled better
-------------------------------+--------------------------------------------
    Reporter:  judahj          |        Owner:         
        Type:  bug             |       Status:  new    
    Priority:  high            |    Milestone:  6.12.1 
   Component:  libraries/base  |      Version:  6.11   
    Severity:  normal          |   Resolution:         
    Keywords:                  |   Difficulty:  Unknown
    Testcase:                  |           Os:  MacOS X
Architecture:  x86             |  
-------------------------------+--------------------------------------------
Changes (by simonmar):

  * priority:  normal => high
  * difficulty:  => Unknown
  * milestone:  => 6.12.1

Comment:

 What do you mean by a "Latin-1 non-ASCII character"?  e.g. a byte between
 0x80 and 0xBF should elicit an error immediately, whereas a byte between
 0xC0 and 0xDF will require one extra byte to determine whether there is a
 decoding error or not.  I do think there's a bug here though: if the bytes
 0xE0 0x00 are received, then GHC will wait for one more byte before
 raising an error, even though the sequence is already erroneous.

 In this example:

 {{{
 ghc -e "putStrLn \"\\249\"" | ./badchar
 }}}

 bear in mind that ghc is using the locale encoding to output '\249', and
 then decoding it again on input.  I think you're seeing the correct result
 here.

 If you feed a real incorrect sequence at the end of the input, GHC behaves
 correctly:

 {{{
 $ cat 3341.hs
 import System.IO
 import GHC.IO.Handle
 import GHC.IO.Encoding
 main = do
     hSetBuffering stdin NoBuffering
 --    hSetEncoding stdin utf8
     getChar >>= print
 $ hexdump -C char2
 00000000  c0 03                                             |..|
 00000002
 $ echo $LANG
 en_US.utf8
 $ ./3341 <char2
 3341: <stdin>: hGetChar: invalid argument (Invalid or incomplete multibyte
 or wide character)
 [2]    27598 exit 1     ./3341 < char2
 }}}

 and you get the same result using the built-in UTF-8 decoder (uncomment
 the hSetEncoding line).

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3341#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to