#3341: encoding errors could be handled better
-------------------------------+--------------------------------------------
Reporter: judahj | Owner:
Type: bug | Status: new
Priority: high | Milestone: 6.12.1
Component: libraries/base | Version: 6.11
Severity: normal | Resolution:
Keywords: | Difficulty: Unknown
Testcase: | Os: MacOS X
Architecture: x86 |
-------------------------------+--------------------------------------------
Changes (by simonmar):
* priority: normal => high
* difficulty: => Unknown
* milestone: => 6.12.1
Comment:
What do you mean by a "Latin-1 non-ASCII character"? e.g. a byte between
0x80 and 0xBF should elicit an error immediately, whereas a byte between
0xC0 and 0xDF will require one extra byte to determine whether there is a
decoding error or not. I do think there's a bug here though: if the bytes
0xE0 0x00 are received, then GHC will wait for one more byte before
raising an error, even though the sequence is already erroneous.
In this example:
{{{
ghc -e "putStrLn \"\\249\"" | ./badchar
}}}
bear in mind that ghc is using the locale encoding to output '\249', and
then decoding it again on input. I think you're seeing the correct result
here.
If you feed a real incorrect sequence at the end of the input, GHC behaves
correctly:
{{{
$ cat 3341.hs
import System.IO
import GHC.IO.Handle
import GHC.IO.Encoding
main = do
hSetBuffering stdin NoBuffering
-- hSetEncoding stdin utf8
getChar >>= print
$ hexdump -C char2
00000000 c0 03 |..|
00000002
$ echo $LANG
en_US.utf8
$ ./3341 <char2
3341: <stdin>: hGetChar: invalid argument (Invalid or incomplete multibyte
or wide character)
[2] 27598 exit 1 ./3341 < char2
}}}
and you get the same result using the built-in UTF-8 decoder (uncomment
the hSetEncoding line).
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3341#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs