On Fri, Nov 27, 2009 at 10:36 PM, Mark Lentczner <[email protected]> wrote:
> I'm in the same quandary:
>
> Data.Binary from the binary package has no error handling
> Data.Serialize from the cereal package uses only strict ByteString
>
> I was going to add error handling to Binary as a weekend project (it isn't
> that hard), but when I contacted the developers of binary, I was pointed at
> cereal. But as my project can parse multi-megabyte objects off the wire, I
> really want lazy ByteString support.
>
> I understand from the cereal creators that lazy ByteStrings are not in the
> future of cereal, since they got a big speed boost by going to strict
> ByteStrings only.
>
> I understand that Bryan O'Sullivan might have done work on adding errors to
> Binary... Bryan? If that's available, can we get it? If not, shall I do the
> work to add error handling? It's a long weekend... I've got time!
>
As an experiment I ported error handling from cereal to binary. It
does work, passes all tests shipped
with binary. Perfomance became worse. According to benchmarks shipped
with binary I observed ~25%
perfomance drop. However when I ported my program I have 40% speedup.
I think it's because I removed
code which work around lack error handling and replaced with error
handling in Get monad
As for strictess concerns. Patch doesn't seem to change strictess from
current state. At least my program
which uses very big inputs works without any changes.
Comments and suggestions are welcome
diff -r 8cb04f000736 src/Data/Binary/Get.hs
--- a/src/Data/Binary/Get.hs Thu Dec 03 00:40:24 2009 +0300
+++ b/src/Data/Binary/Get.hs Fri Dec 04 01:24:19 2009 +0300
@@ -1,5 +1,6 @@
{-# LANGUAGE CPP #-}
{-# OPTIONS_GHC -fglasgow-exts #-}
+{-# LANGUAGE BangPatterns #-}
-- for unboxed shifts
-----------------------------------------------------------------------------
@@ -26,6 +27,7 @@
-- * The Get type
Get
, runGet
+ , runGetE
, runGetState
-- * Parsing
@@ -103,15 +105,23 @@
data S = S {-# UNPACK #-} !B.ByteString -- current chunk
L.ByteString -- the rest of the input
{-# UNPACK #-} !Int64 -- bytes read
+
+type Failure r = String -> Either String (r, S)
+type Success a r = S -> a -> Either String (r, S)
--- | The Get monad is just a State monad carrying around the input ByteString
--- We treat it as a strict state monad.
-newtype Get a = Get { unGet :: S -> (# a, S #) }
+-- | The Get monad is an Exception and State monad.
+newtype Get a = Get { unGet :: forall r. S
+ -> Failure r
+ -> Success a r
+ -> Either String (r, S) }
instance Functor Get where
- fmap f m = Get (\s -> case unGet m s of
- (# a, s' #) -> (# f a, s' #))
- {-# INLINE fmap #-}
+ fmap p m = Get (\s0 f k -> unGet m s0 f (\s a -> k s (p a)))
+
+instance Monad Get where
+ return a = Get (\s0 _ k -> k s0 a)
+ m >>= g = Get (\s0 f k -> unGet m s0 f (\s a -> unGet (g a) s f k))
+ fail = failDesc
#ifdef APPLICATIVE_IN_BASE
instance Applicative Get where
@@ -119,29 +129,18 @@
(<*>) = ap
#endif
--- Definition directly from Control.Monad.State.Strict
-instance Monad Get where
- return a = Get $ \s -> (# a, s #)
- {-# INLINE return #-}
-
- m >>= k = Get $ \s -> case unGet m s of
- (# a, s' #) -> unGet (k a) s'
- {-# INLINE (>>=) #-}
-
- fail = failDesc
-
-instance MonadFix Get where
- mfix f = Get $ \s -> let (a,s') = case unGet (f a) s of
- (# a', s'' #) -> (a',s'')
- in (# a,s' #)
+--instance MonadFix Get where
+-- mfix f = Get $ \s -> let (a,s') = case unGet (f a) s of
+-- (# a', s'' #) -> (a',s'')
+-- in (# a,s' #)
------------------------------------------------------------------------
get :: Get S
-get = Get $ \s -> (# s, s #)
+get = Get (\s0 _ k -> k s0 s0)
put :: S -> Get ()
-put s = Get $ \_ -> (# (), s #)
+put s = Get (\_ _ k -> k s ())
------------------------------------------------------------------------
--
@@ -176,24 +175,40 @@
(x:xs') -> S x (B.LPS xs')
#endif
+finalK :: Success a a
+finalK s a = Right (a, s)
+
+failK :: Failure a
+failK s = Left s
+
-- | Run the Get monad applies a 'get'-based parser on the input ByteString
runGet :: Get a -> L.ByteString -> a
-runGet m str = case unGet m (initState str) of (# a, _ #) -> a
+runGet m str = case unGet m (initState str) failK finalK of
+ Right (a, _) -> a
+ Left message -> error message
-- | Run the Get monad applies a 'get'-based parser on the input
-- ByteString. Additional to the result of get it returns the number of
-- consumed bytes and the rest of the input.
runGetState :: Get a -> L.ByteString -> Int64 -> (a, L.ByteString, Int64)
runGetState m str off =
- case unGet m (mkState str off) of
- (# a, ~(S s ss newOff) #) -> (a, s `join` ss, newOff)
+ case unGet m (mkState str off) failK finalK of
+ Right (a, ~(S s ss newOff)) -> (a, s `join` ss, newOff)
+ Left message -> error message
+
+-- | Run the Get monad applies a 'get'-based parser on the input ByteString
+runGetE :: Get a -> L.ByteString -> Either String a
+runGetE m str = case unGet m (initState str) failK finalK of
+ Right (a, _) -> Right a
+ Left message -> Left message
+
------------------------------------------------------------------------
failDesc :: String -> Get a
failDesc err = do
- S _ _ bytes <- get
- Get (error (err ++ ". Failed reading at byte position " ++ show bytes))
+ S _ _ bytes <- get
+ Get (\_ f _ -> f (err ++ ". Failed reading at byte position " ++ show bytes))
-- | Skip ahead @n@ bytes. Fails if fewer than @n@ bytes are available.
skip :: Int -> Get ()
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe