Re: Binary files in Haskell

Timothy Robin BARBOUR Sun, 22 Feb 1998 05:01:26 GMT
>>>>> "Steve" == Steve Roggenkamp <[EMAIL PROTECTED]> writes:

    Steve> I would like to use Haskell for several larger scale
    Steve> projects, but I can't figure out how to read and write
    Steve> binary data.  It does not appear that the language supports
    Steve> binary files.  Am I missing something?

I have the same problem, but it is not intractable. 

I'd like some comments from Glasgow people on the last possibility
below.

There is the (automatically deriveable) Binary class, defined in the
Haskell Report ? Unfortuately ghc does not support it yet.  But ghc
does provide the Native class (not automatically deriveable), but with
many pre-defined instances. Using Native should be very like using C++
streams.

The function that flattens an arbitrary (mostly) type into bytes is
probably a polytypic function. There is a polytypic pre-processor for
Haskell called Polyp, but it has severe restrictions at present. It
was claimed some time ago that the next version of Polyp would remove
the restrictions. This would be rather useful. In the meantime there
is another pre-processor called Derive, which can be used to
automatically derive instances of non-standard classes (such as
Native). It lacks the elegance and generality of Polyp, but it is
usable now. See http://www.dcs.gla.ac.uk/~nww/derive.html .

Another thing you will need is some way to transport your binary data
between platforms e.g. Intel -> Alpha. It would probably be
straighforward to make an endian-aware subclass of Native, that kept
the binary representation in network-byte-order. If no-one else does
this I will at some stage. Of course transporting between platforms
may be quite rare, in which case ascii (Show and Read) might
work. There have been claims that Read is sometimes very inefficient at
parsing large structures - it *might* need a million years for a large
file.

There is another way one might proceed. Why not just use a
memory-mapped file (mmap) to make the data persistent in-place ? This
would be a way of getting efficient persistence of (almost) any
Haskell data structure without any code-writing and without any
flattening. There are a few difficulties here, but it may well be
feasible. The problems are:

(i) The data structure better not contain any lazy closures, since
they are unlikely to be valid on a future run of the
program. Solution: restrict this technique to strict data structures
for now.

(ii) When the data structure is constructed, ghc will put in bits of
memory from all-over the heap. Solution: use the 2-space copying
garbage collector to gc the data structure (just give it the data
structure for its root set), with the too-space being the memory from
the mmapped file e.g. obtained using mmalloc. This will eliminate
ordinary ghc heap from the data structure, so the file can be closed.
For a large file (e.g. several Gb), closing would take quite a while
because of the gc. It would also need (temporarily) twice the storage
space of the file.

Doing the above might just be a case of knowing how to call the
garbage collector appropriately. 

Any comments from Glasgow ?

Tim
--


----------------------------------------------------------------------
T.R.BARBOUR                             Email : [EMAIL PROTECTED]
----------------------------------------------------------------------
Department of Computer Science
The University of Melbourne
Parkville, Victoria 3052
Australia
----------------------------------------------------------------------
Re: Binary files in Haskell

Reply via email to