Thanks Malcolm, that is a great explanation.
In this code, in fact we
managed to keep the lookup function pure, because the Binary interface
allows referentially-transparent I/O (getFAt), which may not be
possible in other serialisation libraries. This facility however
depends on an invariant that the B-Tree can only be extended - there
is no operation to remove nodes or data.
That's perfect. The insertion functions clear up the IO part of the
equation immensely and since I will need to be able to delete nodes and
data it will allow me to implement search/delete myself and cement a
bit of this in my head.
Cheers, that was a great help.
On 07/11/2005, at 11:12 PM, Malcolm Wallace wrote:
Scott Weeks <[EMAIL PROTECTED]> writes:
For a project that I'm working on I need to implement a B-Tree or
similar disk based data structure for storing large amounts of data. I
feel like I'm getting it and then I get hung up on disk IO operations
and trying to figure out how they should fit into the mix.
Here is a first-cut specification of a B-Tree:
data BTree k d = BTree Int [(k,[d])] [BTree k d]
where k = key and d = stored data. Obviously this does not actually
store the B-Tree on file, it just gives a flavour of the intended
structure.
In order to store the data on file, we need to introduce an indirection
corresponding to a file pointer, on each of the child nodes.
data BTree k d = BTree Int [(k,[d])] [FilePtr (BTree k d)]
Then you need a serialisation routine to write any one node of the
tree out and read it back in. And your tree-traversal routines
(lookup, insert, etc) need to be embedded in the IO monad, so they
can do the necessary file access.
Attached is an implementation of B-Trees (by Colin Runciman), which
uses nhc98's Binary class
http://haskell.org/nhc98/libs/Binary.html
for serialisation. The type BinHandle corresponds to a file handle,
and Bin corresponds to a file pointer. In this code, in fact we
managed to keep the lookup function pure, because the Binary interface
allows referentially-transparent I/O (getFAt), which may not be
possible in other serialisation libraries. This facility however
depends on an invariant that the B-Tree can only be extended - there
is no operation to remove nodes or data. However the insertion
routine clearly must use I/O to extend the tree. I hope you can see
the pattern of how this I/O works.
Note also that the B-Tree structure is stored in BPages, separate
from the data values hanging off the tree, which are in DBlocks.
Hope this helps,
Regards,
Malcolm
<BTree.hs>_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe