There are two basic approaches of accessing a collection, be it a file, a database, or a generating function. One approach relies on enumerators (aka folds); the other is based on lazy lists (aka cursors, streams). Given a stream, we can always construct the corresponding enumerator. It's perhaps less appreciated that given a left fold enumerator, we can always construct the corresponding stream [1]: we can mechanically invert an enumerator inside out. It has been argued [2] that the enumerator approach is superior, especially from the point of view of managing scarce resources such as database connections. When we design the interface to a collection, it seems a good idea to choose enumerator as a primitive. We can always obtain a stream if we really need it.
The mechanical inversion procedure presented in [2] had a catch: it relies on shift/reset (or call/cc plus a mutable cell, which is the same thing). How can we do such an inversion in Haskell? Well, we can introduce a right fold enumerator, which is more amenable to such transformation. Or we can use a continuation monad and emulate shift/reset. The present article demonstrates a third approach: a half-baked, non-recursive left-fold. We argue that such a left fold is the best interface for a collection. Indeed, given the half-baked left-fold we can: - instantiate it into the ordinary left fold - instantiate in into a stream If we turn two enumerators into streams, we can *safely* interleave these streams. The rest of the article demonstrates the inversion procedure. We use a file as a sample "collection", a collection of characters. Haskell provides a stream interface to that collection: hGetChar. We implement a left fold enumerator. We then show how to turn that enumerator back to a stream: how to implement a function mygetchar only in terms of the left fold enumerator. The approach is general and uses no monadic heavy lifting. The following code is Haskell98. No unsafe operations are used. > import IO The desired left-fold enumerator over a file: > hfold_left:: FileName -> Title > -> Iteratee seed > -> seed -- the initial seed > -> IO seed > type FileName = String > type Title = String -- just an identifying string for debug printing > type Iteratee seed = seed -> Char -> Either seed seed The function (seed -> Char -> Either seed seed) is an iteratee: it takes the current seed and the current character and returns a new seed. To be more precise, if it returns Right seed', the iteration continues with seed' as the new seed. If the iteratee returns Left seed'', the enumerator immediately stops further iterations, frees all the resources, and returns seed'' as the final result. So our left fold enumerator supports premature termination of iterations, a quite useful feature. As me mentioned above, we actually need a half-baked, non-recursive enumerator hfold_left' > make_hfold_left' filename title = do > h <- openFile filename ReadMode > let hfold_left' self iteratee seed = do > mc <- try $ hGetChar h > case mc of > Left exc -> hClose h >> return seed > Right c -> do > putStrLn $ " reading from " ++ title ++ ". Got " ++ (show c) > case iteratee seed c of > Left seed -> hClose h >> return seed > Right seed -> self iteratee seed > return hfold_left' We can easily obtain the desired hfold_left as an instantiation of hfold_left': > hfold_left filename title iteratee seed = do > hfold_left' <- make_hfold_left' filename title > let g = hfold_left' g > hfold_left' g iteratee seed We should note that the Handle is an internal resource and is never leaked. The statement putStrLn $ " reading from " ++ title ... is for debugging. We want to be sure that we don't read from a stream too much. This will be important for interleaving. Let us create a test file > tf = "/tmp/a" > make_test_file fname = do > h <- openFile fname WriteMode > mapM_ ((hPutStr h) . show) [0..9] > putStrLn $ "Wrote " ++ fname > hClose h and test our left fold enumerator > test_read fname n = do > (count,acc) <- hfold_left fname "handle" reader (0,[]) > putStrLn $ "Have read: " ++ (show acc) > where reader (count,acc) c = > let count' = count + 1 > acc' = c:acc > in (if count' >= n then Left else Right) (count',acc') The first test checks the premature termination. We read only five characters, and then terminate the iteration and dispose of its resources, including the (invisible) handle. *Main> test_read tf 5 reading from handle. Got '0' reading from handle. Got '1' reading from handle. Got '2' reading from handle. Got '3' reading from handle. Got '4' Have read: "43210" The following tests the exhaustive iteration. *Main> test_read tf 20 reading from handle. Got '0' reading from handle. Got '1' reading from handle. Got '2' reading from handle. Got '3' reading from handle. Got '4' reading from handle. Got '5' reading from handle. Got '6' reading from handle. Got '7' reading from handle. Got '8' reading from handle. Got '9' Have read: "9876543210" Let us suppose that make_hfold_left' is the only way to access files. Can we recover ordinary i/o streams from it? The answer is yes. We can get the familiar open, getchar and eof functions from the half-baked enumerator. Furthermore, the derivation of these functions is *independent* of the precise nature of the enumerator. We will never need (nor be given) access to the handle. > data MyStream = MyNil (Maybe Char) | MyCons Char (IO MyStream) > > myopen:: FileName -> Title -> IO MyStream > > myopen filename title = do > hfold_left' <- make_hfold_left' filename title > let k fn (MyNil Nothing) = return $ MyNil Nothing > k fn (MyNil (Just c)) > = return $ MyCons c (hfold_left' k fn (MyNil Nothing)) > hfold_left' k (\_ c -> Right $ MyNil $ Just c) (MyNil Nothing) > mygetchar:: MyStream -> IO (Char,MyStream) > mygetchar (MyCons c k) = k >>= (return . ((,) c)) > > myiseof:: MyStream -> Bool > myiseof (MyNil Nothing) = True > myiseof _ = False We can now test reading from two streams with interleaving: > test_interleave fname = do > stream1 <- myopen fname "stream1" > stream2 <- myopen fname "stream2" > > putStrLn "\nReading 2 chars from one stream" > (c1,stream1) <- mygetchar stream1 > (c2,stream1) <- mygetchar stream1 > putStrLn $ "Read: " ++ (show [c1,c2]) > > putStrLn "\nReading 3 chars from the second stream" > (c1,stream2) <- mygetchar stream2 > (c2,stream2) <- mygetchar stream2 > (c3,stream2) <- mygetchar stream2 > putStrLn $ "Read: " ++ (show [c1,c2,c3]) > > putStrLn "\nReading again 2 chars from the first stream" > (c1,stream1) <- mygetchar stream1 > (c2,stream1) <- mygetchar stream1 > putStrLn $ "Read: " ++ (show [c1,c2]) *Main> test_interleave tf reading from stream1. Got '0' reading from stream2. Got '0' Reading 2 chars from one stream reading from stream1. Got '1' reading from stream1. Got '2' Read: "01" Reading 3 chars from the second stream reading from stream2. Got '1' reading from stream2. Got '2' reading from stream2. Got '3' Read: "012" Reading again 2 chars from the first stream reading from stream1. Got '3' reading from stream1. Got '4' Read: "23" Note that we read ahead by exactly one character. If we want to detect EOF, that is inevitable (we need to attempt reading if we want to detect EOF). We should point out the relationship between hfold_left, hfold_left' and myopen: hfold_left is the fixpoint of hfold_left'. OTH, myopen captures the continuation of hfold_left' in an IO action. This relationship once again illustrates that call/cc and Y are indeed two sides of the same coin [3]. [1] General ways to traverse collections http://pobox.com/~oleg/ftp/Scheme/enumerators-callcc.html [2] An argument against cursors http://srfi.schemers.org/srfi-44/mail-archive/maillist.html [3] Self-application as the fixpoint of call/cc http://google.com/groups?selm=7eb8ac3e.0309182239.5a64b3b1%40posting.google.com _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell