Comments inline below:
On 07/03/2014 09:34 PM, Christian Marie wrote:
I recently had reason to buffer up chunks of streamed data and coalesce them
into strict ByteStrings that are at least a given buffer size. Splitting up
chunks of data is not allowed.
The data flow looks like this (reading top to bottom):
Big File (Lazy ByteString)
▼
Parsed Chunks (Int, Builder)
▼
ByteStrings (Strict ByteString)
▼
Network (sent via ZeroMQ)
Which lead to this function:
-- Take a producer of (Int, Builder), where Int is the number of bytes in the
-- builder and produce chunks of n bytes.
chunkBuilder :: Monad m
=> Producer (Int, Builder) m r
-> Producer S.ByteString m r
Which uses, internally:
builderChunks :: Monad m
=> Int
-- ^ The size to split a stream of builders at
-> Producer (Int, Builder) m r
-- ^ The input producer
-> FreeT (Producer Builder m) m r
-- ^ The FreeT delimited chunks of that producer, split into
-- the desired chunk length
So it just splits the Producer into groups, and folds them back together with
mappend. This works exactly as expected, lovely, however it's quite verbose and
possibly over-complicated?
I think this is as good as it is going to get. My rule of thumb for
these things is "would the equivalent code on ordinary lists be
simpler?" In other words, would it be easier to implement the following
type:
builderChunks :: Int -> [(Int, Builder)] -> [[Builder]]
The solution for lists would probably look very similar, recursing over
the list by hand, accumulating the length of bytes seen so far. It
might be smaller by a constant factor (since pattern matching on lists
is syntactically cheaper than "pattern matching" on producers using
`next`), but the overall algorithm would be roughly the same.
The actual implementation can be found here:
https://github.com/anchor/marquise/blob/master/lib/Marquise/Server.hs
Side note: your code is very readable!
By the way, can anyone tell me how to get the type annotation on "go" to
typecheck?
My question: It seems like builderChunks could be re-usable. Possibly
implemented as a lens similar to chunksOf?
--| someAwesomeName is a lens that splits a Producer into a FreeT group of
-- Producers of at least the minimum size provided.
someAwesomeName :: Monad m
=> Int
-- ^ Minimum size
-> Lens' (Producer (Int, a) m x) (FreeT (Producer a m) m x)
I think that is a little too specialized. However, there may be a more
useful generalization of that idea that would be sort of like a
`takewhile` lens.
Another question: Would it be reasonable to expose the splitting part of the
lenses in your pipes-parse into separate functions? I.e.
chunksOf
:: Monad m => Int -> Lens' (Producer a m x) (FreeT (Producer a m) m x)
chunksOf' :: Monad m => Int -> Producer a m x -> FreeT (Producer a m) m x
This would be convenient for users wanting to avoid lens as a dependency.
You can depend on `lens-family-core`, which is a really tiny dependency,
instead of `lens`.
The main reason I use lenses is to keep the API small. Each lens
actually packs a lot of features into a single term. For example,
`chunksOf` can be used to do things like:
-- These all work with `lens-family-core-1.1.0` now
-- Take the first four chunksof 3
over (chunksOf 3) (takes 4)
-- Replace every chunk with [1, 2, 3]
set (chunksOf 3 . individually) (each [1, 2, 3])
-- Prepend a `0` before every chunk
over (chunksOf 3 . individually) (yield 0 >>)
Using the `lens` library you can do much more than that.
Also, I particularly try to encourage the use of the `over` idiom
instead of explicit splitting and joining when possible. Anytime you
find yourself doing this
concats . f . view (chunksOf n)
... you can use `over` instead:
over (chunksOf n) f
--
You received this message because you are subscribed to the Google Groups "Haskell
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].