Re[2]: [Haskell-cafe] How to use Unicode strings?

2008-11-23 Thread Bulat Ziganshin
Hello Alexey,

Sunday, November 23, 2008, 10:20:47 AM, you wrote:

 And this problem related not only to IO. It raises whenever strings cross
 border between haskell world and outside world. Opening files with unicode
 names, execing, etc.

this completely depends on libraries, and ghc-bundled i/o libs doesn't
support unicode filenames. freearc project contains its own simple i/o
library that doesn't have this problem (and also support files 4gb on
windows). unfortunately, this library doesn't include any buffering

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to use Unicode strings?

2008-11-23 Thread wren ng thornton

Alexey Khudyakov wrote:

But this bring question what the right thing is? If locale is UTF8 or system
support unicode some other way - no problem, just encode string properly.
Problem is how to deal with untanslatable characters. Skip? Replace with
question marks? Anything other? Probably we need to look how this is
solved in other languages. (Or not solved)


Regarding untranslatable characters, I think the only correct thing to 
do is consider it exceptional behavior and have the conversion function 
accept a handler function which takes the character as input and 
produces a string for it. That way programs can define their own 
behavior, since this is something that doesn't have a right way to 
recover in all cases. Canonical handlers which skip, replace with 
question marks (or other arbitrary character), throw actual exceptions, 
etc could be provided for convenience.


For stream-based strings a al ByteString, dealing with this sort of a 
handler in an efficient manner is fairly straightforward (though some 
CPS tricks may be needed to get rid of the Maybe in the result of the 
basic converter). For [Char] strings efficiency is harder, but the 
implementation should still be easy (given the basic converter).


Most extant languages I've seen tend to pick a single solution for all 
cases, but I don't think we should follow along that path.


--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] How to use Unicode strings?

2008-11-22 Thread Dmitri O.Kondratiev
Please advise how to write Unicode string, so this example would work:

main = do
  putStrLn Les signes orthographiques inclus les accents (aigus, grâve,
circonflexe), le tréma, l'apostrophe, la cédille, le trait d'union et la
majuscule.

I get the following error:
hello.hs:4:68:
lexical error in string/character literal (UTF-8 decoding error)
Failed, modules loaded: none.
Prelude

Also, how to read Unicode characters from standard input?

Thanks!

-- 
Dmitri O. Kondratiev
[EMAIL PROTECTED]
http://www.geocities.com/dkondr
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to use Unicode strings?

2008-11-22 Thread Luke Palmer
2008/11/22 Dmitri O.Kondratiev [EMAIL PROTECTED]:
 Please advise how to write Unicode string, so this example would work:

 main = do
   putStrLn Les signes orthographiques inclus les accents (aigus, grâve,
 circonflexe), le tréma, l'apostrophe, la cédille, le trait d'union et la
 majuscule.

That really ought to work.  Is the file encoded in UTF-8 (rather than,
eg. latin-1)?

Luke
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to use Unicode strings?

2008-11-22 Thread Austin Seipp
Excerpts from Dmitri O.Kondratiev's message of Sat Nov 22 05:40:41 -0600 2008:
 Please advise how to write Unicode string, so this example would work:
 
 main = do
   putStrLn Les signes orthographiques inclus les accents (aigus, grâve,
 circonflexe), le tréma, l'apostrophe, la cédille, le trait d'union et la
 majuscule.
 
 I get the following error:
 hello.hs:4:68:
 lexical error in string/character literal (UTF-8 decoding error)
 Failed, modules loaded: none.
 Prelude
 
 Also, how to read Unicode characters from standard input?
 
 Thanks!
 

Hi,

Check out the utf8-string package on hackage:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string

In particular, you probably want the System.IO.UTF8 functions, which
are identical to to their non-utf8 counterparts in System.IO except,
well, they handle unicode properly.

More specifically, you will probably want to mainly look at
Codec.Binary.UTF8.String.encodeString and decodeString, respectively
(in fact, most of the System.IO.UTF8 functions are defined in terms of
these, e.g. 'putStrLn x = IO.putStrLn (encodeString x)' and 'getLine =
liftM decodeString IO.getLine'.)

Austin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to use Unicode strings?

2008-11-22 Thread Janis Voigtlaender

Alexey Khudyakov wrote:

putStrLn Ну и где этот ваш хвалёный уникод?


:-)

--
Dr. Janis Voigtlaender
http://wwwtcs.inf.tu-dresden.de/~voigt/
mailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to use Unicode strings?

2008-11-22 Thread Don Stewart
alexey.skladnoy:
 
  That really ought to work.  Is the file encoded in UTF-8 (rather than,
  eg. latin-1)?
 
 This should pretend to work. Simple print functions garble unicode characters.
 For example :
 
  putStrLn Ну и где этот ваш хвалёный уникод?
 
 prints following output
 
 C 8 345 MBB 20H E20;Q=K9 C=8:4?
 
 Not pretty? Althrough Dmitri's variant seems to work fine.

Use the UTF8 printing functions,

import qualified System.IO.UTF8 as U

main = U.putStrLn Ну и где этот ваш хвалёный уникод?

Running this,

*Main main
Ну и где этот ваш хвалёный уникод?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to use Unicode strings?

2008-11-22 Thread Duncan Coutts
On Sat, 2008-11-22 at 10:02 -0800, Don Stewart wrote:

 Use the UTF8 printing functions,
 
 import qualified System.IO.UTF8 as U
 
 main = U.putStrLn Ну и где этот ваш хвалёный уникод?
 
 Running this,
 
 *Main main
 Ну и где этот ваш хвалёный уникод?


This upsets me. We need to get on with doing this properly. The
System.IO.UTF8 module is a useful interim workaround but we're not using
it properly most of the time.

It is right when you're working with a text file that you know to be in
the UTF-8 format. For example .cabal files are UTF-8, irrespective of
the platform or the system locale.

It is not right when working with the terminal. The encoding of the
terminal is given by the locale. We cannot statically declare that it is
UTF-8.

The right thing to do is to make Prelude.putStrLn do the right thing. We
had a long discussion on how to fix the H98 IO functions to do this
better. We just need to get on with it, or we'll end up with too many
cases of people using System.IO.UTF8 inappropriately.

For the case where System.IO.UTF8 is right we probably still want a more
general solution, like a handle setting for the encoding.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to use Unicode strings?

2008-11-22 Thread Alexey Khudyakov

 This upsets me. We need to get on with doing this properly. The
 System.IO.UTF8 module is a useful interim workaround but we're not using
 it properly most of the time.

 ... skipped ...

 The right thing to do is to make Prelude.putStrLn do the right thing. We
 had a long discussion on how to fix the H98 IO functions to do this
 better. We just need to get on with it, or we'll end up with too many
 cases of people using System.IO.UTF8 inappropriately.

But this bring question what the right thing is? If locale is UTF8 or system
support unicode some other way - no problem, just encode string properly.
Problem is how to deal with untanslatable characters. Skip? Replace with
question marks? Anything other? Probably we need to look how this is
solved in other languages. (Or not solved)

And this problem related not only to IO. It raises whenever strings cross
border between haskell world and outside world. Opening files with unicode
names, execing, etc.

For example:
Prelude readFile файл
*** Exception: D09;: openFile: does not exist (No such file or directory)
Prelude executeFile echo True [Сейчас сломается] Nothing
!59G0A A;05BAO

Althrough it's possible to work around using encodeString/decodeString from
Codec.Binary.UTF8.String it won't work on non-UTF8 systems. It's not only
neandertalian systems with one-byte locales, windows AFAIK uses other
unicode encoding.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to use Unicode strings?

2008-11-22 Thread Don Stewart
alexey.skladnoy:
 
  This upsets me. We need to get on with doing this properly. The
  System.IO.UTF8 module is a useful interim workaround but we're not using
  it properly most of the time.
 
  ... skipped ...
 
  The right thing to do is to make Prelude.putStrLn do the right thing. We
  had a long discussion on how to fix the H98 IO functions to do this
  better. We just need to get on with it, or we'll end up with too many
  cases of people using System.IO.UTF8 inappropriately.
 
 But this bring question what the right thing is? If locale is UTF8 or system
 support unicode some other way - no problem, just encode string properly.
 Problem is how to deal with untanslatable characters. Skip? Replace with
 question marks? Anything other? Probably we need to look how this is
 solved in other languages. (Or not solved)
 
 And this problem related not only to IO. It raises whenever strings cross
 border between haskell world and outside world. Opening files with unicode
 names, execing, etc.
 
 For example:
 Prelude readFile файл
 *** Exception: D09;: openFile: does not exist (No such file or directory)
 Prelude executeFile echo True [Сейчас сломается] Nothing
 !59G0A A;05BAO
 
 Althrough it's possible to work around using encodeString/decodeString from
 Codec.Binary.UTF8.String it won't work on non-UTF8 systems. It's not only
 neandertalian systems with one-byte locales, windows AFAIK uses other
 unicode encoding.

For just decoding / encoding in other locales, there are codec
libraries. Hunt around on hackage.

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/encoding
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Encode


-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe