You're definitely on the right track. The type I would aim for would be
something like this:
example :: Handle -> Producer MDHAndScanLine IO (Either
DecodingError (Producer ByteString IO ()))
Notice that this slightly differs from your type; I'm merging the outer
`IO (Either DecodingError ...)` into the first `Producer` to simplify
the type.
The implementation for that type would be very similar to the one you
wrote in your second e-mail:
example :: Handle -> Producer MDHAndScanLine IO (Either
DecodingError (Producer ByteString IO ()))
example handle = do
let p = Pipes.ByteString.fromHandle handle
x <- lift (evalStateT (decodeGet getWord32le) p)
case x of
Left err -> return (Left err)
Right len -> do
lift (hSeek handle AbsoluteSeek (fromIntegral l))
view decoded p
That will definitely run in constant memory, meaning that it won't ever
load more than one chunk of bytes at a time (where a chunk is something
like 32 kB, I think). You can profile the heap if you want to verify
this by following these instructions:
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/prof-heap.html
Also, to answer your other question, `pipes-attoparsec` runs in constant
memory. The difference between `pipes-attoparsec` and `attoparsec` is
that `pipes-attoparsec` runs a separate parser for each element in the
stream, which is equivalent to "committing" after each parsed element.
That means that it can only backtrack while parsing a single element in
the stream, but no further back. This is why `pipes-attoparsec` runs in
constant space over a large file and why `attoparsec` does not, because
`attoparsec` backtracks indefinitely and `pipes-attoparsec` does not.
On 9/21/15 12:10 PM, Dylan Tisdall wrote:
Following up on my last question, my next issue is also probably a
very straight ahead example of pipes, but I've managed to get tangled
up going back and forth in the packages' documentation.
I've got a file whose first 4 bytes give the offset into the file of a
series of binary data elements (called MDHs in my case). Given a
Handle to the start of such a file, I want to:
1. read the first Word32 in the file, to retrieve the offset;
2. skip the Handle to that offset; and
3. turn the rest of the file into a Producer MDH IO ()
Given that the file I'm reading may be large, I want to make sure this
process is going to run in constant memory. I thought I could use
pipes-attoparsec, but I couldn't get straight whether it would need to
read the whole file before it could produce anything (as I understand
is normally the case with attoparsec).
So far I have the following, which isn't complete, but at least does
the skip and converts the remaining file to a ByteString producer.
|
handleToMDHs ::Handle->IO
(EitherP.DecodingError(P.ProducerP.ByteStringIO ()))
handleToMDHs h =do
hLen <-P.evalStateT (P.decodeGet getWord32le)(PB.fromHandle h)
case(hLen ::EitherP.DecodingErrorWord32)of
Lefterr ->return$ Lefterr
Rightlen ->fmap Right(skipAndProceed h len)
where
skipAndProceed ::Handle->Word32->IO (P.ProducerP.ByteStringIO ())
skipAndProceed handle l =do
(hSeek handle AbsoluteSeek)(fromIntegral l)
return$ PB.fromHandle handle
|
My MDH type is an instance of Binary, so there is a get method
available. I'm wondering:
a) What's the right way to turn this into a Producer of MDHs instead
of a Producer of ByteStrings while operating in constant memory?
b) Is there a more elegant way to deal with error handling here? I'm
not even dealing with possible failure in hSeek, and I already think
this looks pretty messy. I'm not wedded to my function type being
|
handleToMDHs ::Handle->IO (EitherP.DecodingError(P.ProducerMDH IO ()))
|
I just am not sure how else to express the possibility of failure in
this kind of operation.
Thanks,
Dylan
--
You received this message because you are subscribed to the Google
Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to haskell-pipes+unsubscr...@googlegroups.com
<mailto:haskell-pipes+unsubscr...@googlegroups.com>.
To post to this group, send email to haskell-pipes@googlegroups.com
<mailto:haskell-pipes@googlegroups.com>.
--
You received this message because you are subscribed to the Google Groups "Haskell
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.