Conrad Parker <con...@metadecks.org> writes: > Hi, > > I am reading data from a file as strict bytestrings and processing > them in an iteratee. As the parsing code uses Data.Binary, the > strict bytestrings are then converted to lazy bytestrings (using > fromWrap which Gregory Collins posted here in January: > > -- | wrapped bytestring -> lazy bytestring > fromWrap :: I.WrappedByteString Word8 -> L.ByteString > fromWrap = L.fromChunks . (:[]) . I.unWrap
This just makes a 1-chunk lazy bytestring: (L.fromChunks . (:[])) :: S.ByteString -> L.ByteString > ). The parsing is then done with the library function > Data.Binary.Get.runGetState: > > -- | Run the Get monad applies a 'get'-based parser on the input > -- ByteString. Additional to the result of get it returns the number of > -- consumed bytes and the rest of the input. > runGetState :: Get a -> L.ByteString -> Int64 -> (a, L.ByteString, Int64) > > The issue I am seeing is that runGetState consumes more bytes than the > length of the input bytestring, while reporting an > apparently successful get (ie. it does not call error/fail). I was > able to work around this by checking if the bytes consumed > input > length, and if so to ignore the result of get and simply prepend the > input bytestring to the next chunk in the continuation. Something smells fishy here. I have a hard time believing that binary is reading more input than is available? Could you post more code please? > However I am curious as to why this apparent lack of bounds checking > happens. My guess is that Get does not check the length of the input > bytestring, perhaps to avoid forcing lazy bytestring inputs; does that > make sense? > > Would a better long-term solution be to use a strict-bytestring binary > parser (like cereal)? So far I've avoided that as there is > not yet a corresponding ieee754 parser. If you're using iteratees you could try attoparsec + attoparsec-iteratee which would be a more natural way to bolt parsers together. The attoparsec-iteratee package exports: parserToIteratee :: (Monad m) => Parser a -> IterateeG WrappedByteString Word8 m a Attoparsec is an incremental parser so this technique allows you to parse a stream in constant space (i.e. without necessarily having to retain all of the input). It also hides the details of the annoying buffering/bytestring twiddling you would be forced to do otherwise. Cheers, G -- Gregory Collins <g...@gregorycollins.net> _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe