Re: [Haskell-cafe] Strings and utf-8

2007-11-29 Thread Reinier Lamers

Bulat Ziganshin wrote:


Hello Andrew,

Thursday, November 29, 2007, 1:11:38 AM, you wrote:

 


IMHO, someone should make a full proposal by implementing an alternative
System.IO library that deals with all these encoding issues and
implements H98 IO in terms of that.
 



 


We need two seperate interfaces. One for text-mode I/O, one for raw
binary I/O.
   



 


When doing text-mode I/O, the programmer needs to be able to explicitly
specify exactly which character encoding is required. (Presumably 
default to the current 8-bit truncation encoding?)
   



http://haskell.org/haskellwiki/Library/Streams already exists
 

Which would mean that we have streams to do character I/O, ByteString to 
do binary I/O, and System.IO to do, eh, something in between.


That seems rather unfortunate to me. While the truncate to 8 bits 
semantics may be nice to keep old code working, it really isn't all that 
intuitive. When I do 'putStr u\776', I want a u with an umlaut to 
appear, not to get it printed as if it were u\8.


The strange thing is that Hugs at the moment _does_ print a u-umlaut, 
while ghci prints u\8, which is a u followed by a backspace, so I see 
nothing.


Reinier

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Strings and utf-8

2007-11-29 Thread Bulat Ziganshin
Hello Reinier,

Thursday, November 29, 2007, 1:13:24 PM, you wrote:

IMHO, someone should make a full proposal by implementing an alternative
System.IO library that deals with all these encoding issues and
implements H98 IO in terms of that.

http://haskell.org/haskellwiki/Library/Streams already exists
  

 Which would mean that we have streams to do character I/O, ByteString to
 do binary I/O, and System.IO to do, eh, something in between.

this means only that such proposal exists. i've worked on adding
bytestream support too, but don't finished the work. at least it's
possible. i hope that new i/o library will have modular design like
this so it will be easy to add new features as 3rd-party libs


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strings and utf-8

2007-11-28 Thread Duncan Coutts
On Tue, 2007-11-27 at 18:38 +, Paul Johnson wrote:
 Brandon S. Allbery KF8NH wrote:
  However, the IO system truncates [characters] to 8 bits.

 Should this be considered a bug?

A design problem.

 I presume that its because stdio.h was defined in the days of
 ASCII-only strings, and the functions in System.IO are defined in
 terms of stdio.h.  But does this need to be the case in the future?

When it's phrased as truncates to 8 bits it sounds so simple, surely
all we need to do is not truncate to 8 bits right?

The problem is, what encoding should it pick? UTF8, 16, 32, EBDIC? How
would people specify that they really want to use a binary file.
Whatever we change it'll break programs that use the existing meanings.

One sensible suggestion many people have made is that H98 file IO should
use the locale encoding and do Unicode/String - locale conversion. So
that'd all be text files. Then openBinaryFile would be used for binary
files. Of course then we'd need control over setting the encoding and
what to do on encountering encoding errors.

IMHO, someone should make a full proposal by implementing an alternative
System.IO library that deals with all these encoding issues and
implements H98 IO in terms of that.

It doesn't have to be fast initially, it just has to get the API right
and not design the API so as to exclude the possibility of a fast
implementation later.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strings and utf-8

2007-11-28 Thread Andrew Coppin

Duncan Coutts wrote:

When it's phrased as truncates to 8 bits it sounds so simple, surely
all we need to do is not truncate to 8 bits right?

The problem is, what encoding should it pick? UTF8, 16, 32, EBDIC? How
would people specify that they really want to use a binary file.
Whatever we change it'll break programs that use the existing meanings.

One sensible suggestion many people have made is that H98 file IO should
use the locale encoding and do Unicode/String - locale conversion. So
that'd all be text files. Then openBinaryFile would be used for binary
files. Of course then we'd need control over setting the encoding and
what to do on encountering encoding errors.

IMHO, someone should make a full proposal by implementing an alternative
System.IO library that deals with all these encoding issues and
implements H98 IO in terms of that.

It doesn't have to be fast initially, it just has to get the API right
and not design the API so as to exclude the possibility of a fast
implementation later.
  


In my humble opinion, what should happen is this:

We need two seperate interfaces. One for text-mode I/O, one for raw 
binary I/O. ByteString provides some of the latter. [Can you use that on 
network sockets?] I guess what's needed is a good binary library to go 
with it. [I know there's been quite a few people who've had a go at this 
part...]


When doing text-mode I/O, the programmer needs to be able to explicitly 
specify exactly which character encoding is required. (Presumably 
default to the current 8-bit truncation encoding?) That way the 
programmer can decide exactly how to choose an encoding, rather than the 
library designer trying to guess what The Right Thing is for all 
possible application programs. And it needs to be possible to cleanly 
add new encodings too.


I'd have a go at implementing all this myself, but I wouldn't know where 
to begin...


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Strings and utf-8

2007-11-28 Thread Bulat Ziganshin
Hello Andrew,

Thursday, November 29, 2007, 1:11:38 AM, you wrote:

 IMHO, someone should make a full proposal by implementing an alternative
 System.IO library that deals with all these encoding issues and
 implements H98 IO in terms of that.

 We need two seperate interfaces. One for text-mode I/O, one for raw
 binary I/O.

 When doing text-mode I/O, the programmer needs to be able to explicitly
 specify exactly which character encoding is required. (Presumably 
 default to the current 8-bit truncation encoding?)

http://haskell.org/haskellwiki/Library/Streams already exists


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strings and utf-8

2007-11-27 Thread Paul Johnson

Brandon S. Allbery KF8NH wrote:

However, the IO system truncates [characters] to 8 bits.  I
Should this be considered a bug?  I presume that its because stdio.h 
was defined in the days of ASCII-only strings, and the functions in 
System.IO are defined in terms of stdio.h.  But does this need to be 
the case in the future?


Unfortunately I don't know enough about Unicode IO to judge.

Paul.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Strings and utf-8

2007-11-26 Thread Maurí­cio

Hi,

Are 'String's in GHC 6.6.1 UTF-8?

Thanks,
Maurício

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strings and utf-8

2007-11-26 Thread Brandon S. Allbery KF8NH


On Nov 26, 2007, at 19:23 , Maurí cio wrote:


Are 'String's in GHC 6.6.1 UTF-8?


No.

type String = [Char]

and Char stores Unicode codepoints.  However, the IO system truncates  
them to 8 bits.  I think there are UTF8 marshaling libraries on  
hackage these days, though.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Strings and utf-8

2007-11-26 Thread Don Stewart
allbery:
 
 On Nov 26, 2007, at 19:23 , Maurí cio wrote:
 
 Are 'String's in GHC 6.6.1 UTF-8?
 
 No.
 
 type String = [Char]
 
 and Char stores Unicode codepoints.  However, the IO system truncates  
 them to 8 bits.  I think there are UTF8 marshaling libraries on  
 hackage these days, though.

Yep, utf8string, in particular.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe