> > > > What *does* matter to the programmer is what encodings putStr and > > > > getLine use. AFAIK, they use "lower 8 bits of unicode code point" which > > > > is almost functionally equivalent to latin-1. > > > > > > Which is terrible! You should have to be explicit about what encoding > > > you expect. Python 3000 does it right. > > > > Presumably there wasn't a sufficiently good answer available in time for > > haskell98. > > Will there be one for haskell prime ?
The I/O library needs an overhaul but I'm not sure how to do this in a backwards compatible manner which probably would be required for inclusion in Haskell'. One could, like Python 3000, break backwards compatibility. I'm not sure about the implications of doing this. Maybe introducing a new System.IO.Unicode module would be an option. If one wants to keep the interface but change the semantics slightly one could define e.g. getChar as: getChar :: IO Char getChar = getWord8 >>= decodeChar latin1 Assuming latin-1 is what's used now. The benefit would be that if the input is not in latin-1 an exception could be thrown rather than returning a Char representing the wrong Unicode code point. I recommend reading about the Python I/O system overhaul for Python 3000 which is outlined in PEP 3116 http://www.python.org/dev/peps/pep-3116/ My proposal is for I/O functions to specify the encoding they use if they accept or return Chars (and Strings). If they deal in terms of bytes (e.g. socket functions) they should accept and return Word8s. Optionally, text I/O functions could default to the system locale setting. -- Johan _______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime