Re: Let's get this finished

Manuel M. T. Chakravarty Sun, 07 Jan 2001 19:48:45 -0800

[EMAIL PROTECTED] (Marcin 'Qrczak' Kowalczyk) wrote, > Sun, 07 Jan 2001 13:15:21 +1100, Manuel M. T. Chakravarty <[EMAIL PROTECTED]> >pisze: > > > > When someone really wants to use mallocCString and pokeCString now > > > (knowing that there is a little point of doing that in the case of > > > conversions), he can use mallocArray0 and pokeArray0, after casting > > > characters of the string to [CChar]. > > > > To be honest, I don't like this. It is nice having the interface > > such that we can switch to using conversions at some point, but > > I still want to be able to conveniently deal with 8bit characters > > (because this is what many C libraries use). So, I want a fast and > > convenient interface to 8bit strings *in addition* to the interface > > that can deal with conversions. In particular this means that > > I don't want to deal with CChar in the Haskell interface only to > > circumvent conversion. > > I understand everything except the last sentence. Why it is bad to > deal with CChar in Haskell? > > It could be confusing if some String values represented texts in > Unicode and others - in the C's encoding. (Especially if the programmer > uses ISO-8859-1 for C encoding and does not care about the difference, > and then somebody using ISO-8859-7 tries to run his code!) > > IMHO most strings on which C functions work (those ending with > '\0') are either in the default local encoding (if they are texts > in a natural language or filenames) or more rarely ASCII (if they > are e.g. names of mail headers, identifiers in a C program, or > commandline switches of some program). Sometimes the encoding is > specified explicitly by the protocol or is stored in data itself. > > For ASCII the default local encoding can be used too, with a speed > penalty; practically used encodings are ASCII-compatible. You can > explicitly specify fromLatin1 or toLatin1 if you really want C > characters to map to Haskell's '\0'..'\255' - it should be faster > (does not call iconv or the like). You can also use CChar. The speed penalty is exactly what I am worried about. What you are proposing - if I understand you correctly - is to use Unicode whatever encodings on the Haskell side exclusively and each Haskell<->C conversion of a String has to go through a conversion. Then, as you say, poke and some other routines make no sense on Strings because of the varying string length in different encodings. Ok, I got that. Now what I am thinking is that this will be even slower than the whole business is already. So, an all Unicode Haskell will be even slower than it is now. Strings are used for two purposes in programs: (1) To represent natural language and (2) to represent unstructured program data. For the first case, we have to take the performance penalty if we want the benefit of handling non-ASCII languages. For the second case, however, I think we don't need it. Take for example (and it is not a very good example) the Tk binding for Haskell. It accesses the Tk widget set by constructing Tcl commands at runtime and sending them to the Tcl interpreter via a pipe. That's already pretty inefficient. Now when each of these commands has to go through a Unicode conversion, things will get even worse. Another example is configuration management in libraries like the Gnome library. A program can dump its session data into an ASCII file using these libraries, so that it doesn't have to mantain its own preferences and resource files. Do we really want all this stuff to go through the converter? Furthermore, to be honest, I am not really sure why we have to do the conversion anyway. When I am having a Haskell program like [1] main = putStrLn "今日は" then, there are two possibilities. Either I have a system configured with the locale jp_JP and I happen to run this Haskell program in kterm or an Mule/(X)Emacs subshell, or I will get mojibake[2] anyway. No amount of conversion is going to change that. So, what exactly do I get for the performance penalty that conversion incurs? How about having an interface where the String marshalling functions take an additional argument data CConv = NoCConv -- handle as 8bit chars | StdCConc -- standard conversion | CustomCConv String -- special conversions Then, it is up to the programmer to decide whether to use conversion. The idea of the last variant would be that in your conversion library, I can give conversions a name and identify them by that name. This way the CString wouldn't depend on the exact conversion interface, but still would be open to the addition of new conversions. Routines like mallocCString and pokeCString would only make sense for `NoCConv', then. Cheers, Manuel [1] I hope your mail reader can handle iso-2022-jp :-) [2] Mojibake is the Japanese term for Japanese text displayed through software that cannot handle it. Mojibake is written as "文字化け" in Japanese and if your mail reader can't handle Japanese, you'll see just that ;-) _______________________________________________ FFI mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/ffi

Re: Let's get this finished

Reply via email to