Re: [haskell-pipes] Generalizing a "counting" chunksOf

Gabriel Gonzalez Fri, 04 Jul 2014 09:36:42 -0700

Comments inline below:

On 07/03/2014 09:34 PM, Christian Marie wrote:

I recently had reason to buffer up chunks of streamed data and coalesce them
into strict ByteStrings that are at least a given buffer size. Splitting up
chunks of data is not allowed.


The data flow looks like this (reading top to bottom):

    Big File      (Lazy ByteString)
                ▼
    Parsed Chunks (Int, Builder)
                ▼
    ByteStrings   (Strict ByteString)
                ▼
    Network       (sent via ZeroMQ)

Which lead to this function:

   -- Take a producer of (Int, Builder), where Int is the number of bytes in the
   -- builder and produce chunks of n bytes.

   chunkBuilder :: Monad m
                => Producer (Int, Builder) m r
                -> Producer S.ByteString m r

Which uses, internally:

   builderChunks :: Monad m
                 => Int
                 -- ^ The size to split a stream of builders at
                 -> Producer (Int, Builder) m r
                 -- ^ The input producer
                 -> FreeT (Producer Builder m) m r
                 -- ^ The FreeT delimited chunks of that producer, split into
                 -- the desired chunk length

So it just splits the Producer into groups, and folds them back together with
mappend. This works exactly as expected, lovely, however it's quite verbose and
possibly over-complicated?

I think this is as good as it is going to get. My rule of thumb forthese things is "would the equivalent code on ordinary lists besimpler?" In other words, would it be easier to implement the followingtype:


    builderChunks :: Int -> [(Int, Builder)] -> [[Builder]]

The solution for lists would probably look very similar, recursing overthe list by hand, accumulating the length of bytes seen so far. Itmight be smaller by a constant factor (since pattern matching on listsis syntactically cheaper than "pattern matching" on producers using`next`), but the overall algorithm would be roughly the same.

The actual implementation can be found here:
https://github.com/anchor/marquise/blob/master/lib/Marquise/Server.hs


Side note: your code is very readable!

By the way, can anyone tell me how to get the type annotation on "go" to
typecheck?

My question: It seems like builderChunks could be re-usable. Possibly
              implemented as a lens similar to chunksOf?

   --| someAwesomeName is a lens that splits a Producer into a FreeT group of
   --  Producers of at least the minimum size provided.
   someAwesomeName :: Monad m
                  => Int
                  -- ^ Minimum size
                  -> Lens' (Producer (Int, a) m x) (FreeT (Producer a m) m x)

I think that is a little too specialized. However, there may be a moreuseful generalization of that idea that would be sort of like a`takewhile` lens.

Another question: Would it be reasonable to expose the splitting part of the
lenses in your pipes-parse into separate functions? I.e.

   chunksOf
       :: Monad m => Int -> Lens' (Producer a m x) (FreeT (Producer a m) m x)

   chunksOf' :: Monad m => Int -> Producer a m x -> FreeT (Producer a m) m x

This would be convenient for users wanting to avoid lens as a dependency.

You can depend on `lens-family-core`, which is a really tiny dependency,instead of `lens`.

The main reason I use lenses is to keep the API small. Each lensactually packs a lot of features into a single term. For example,`chunksOf` can be used to do things like:


    -- These all work with `lens-family-core-1.1.0` now

    -- Take the first four chunksof 3
    over (chunksOf 3) (takes 4)

    -- Replace every chunk with [1, 2, 3]
    set (chunksOf 3 . individually) (each [1, 2, 3])

    -- Prepend a `0` before every chunk
    over (chunksOf 3 . individually) (yield 0 >>)

Using the `lens` library you can do much more than that.

Also, I particularly try to encourage the use of the `over` idiominstead of explicit splitting and joining when possible. Anytime youfind yourself doing this


    concats . f . view (chunksOf n)

... you can use `over` instead:

    over (chunksOf n) f

--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] Generalizing a "counting" chunksOf

Reply via email to