I recently had my first exposure to Haskell's FFI when I was trying to compute MD5 and SHA1 hashes using the existing C implementations. In each case, the idea is to make the hash function available as function
> md5 :: String -> String However, the naive implementation > md5_init md5_state > n <- newCString str > md5_append md5_state n (fromIntegral (length str)) > md5_finish md5_state md5_digest does not scale to computing hashes of really long strings (50 MB, say, as arising from reading a moderately big file), since it tries to create a CString of that size, first! Trying to avoid the allocation of this giant CString requires to split up the original string into smaller parts and convert each part to a CString separately. Clearly, this task involves a lot of allocation, essentially the input string needs to be copied part by part. Hence, I was wondering why the FFI only provides functionality to convert an *entire* list of Char into a CString. For applications like this hash computation, it would be advantageous to be able to specify *how much* of the input string to marshall to the CString and have the conversion function return the rest of the input string and the CString. That is, in addition to > newCString :: String -> IO CString there should be > newCStringPart :: String -> Int -> IO (CStringLen, String) or even > toCStringPart :: String -> CStringLen -> IO (Int, String) where CStringLen describes a target buffer into which the String argument is to be marshalled. (and similarly for other list types) Clearly, I can program this functionality by hand. But I have to revert to byte-wise processing using pokeByteOff, castCharToCChar, and so on. In addition, the optimizer does not seem to be very effective on such code, so it seems advantageous to provide it in the library already. But perhaps I'm overlooking something, so I'm appending the code I was using below. -Peter
MD5.hs
Description: three interfaces to MD5