Re: [haskell-pipes] Some code review for a pipes tutorial?

Justin Le Thu, 19 Jun 2014 10:07:17 -0700

Hi,

It's been about a month exactly since your last reply, but now that school 
is over I've found time to finish the post; if you had time, would you mind 
helping me make sure it is correct and useful for people trying to learn 
lens?  it's online at 
http://blog.jle.im/entry/pipes-streaming-huffman-compression-in-haskell-part-3
 
  (it only slightly builds on some non-pipes material from the previous two 
parts of the tutorial.)  I'd greatly appreciate any help; my goal is to 
hopefully help with the availability of basic pipes projects available 
online and I'd hate to be spreading misinformation or bad intuitions.


Thanks again :)
Justin

On Tuesday, May 20, 2014 6:44:29 PM UTC-7, Gabriel Gonzalez wrote:
>
>  `pack` is defined like this:
>
>     pack = dimap to (fmap from)
>
> ... which is equivalent to:
>
>     pack = iso to from
>
> When you apply `view` to an isomorphism, it gives you the forward 
> direction:
>
>     view (iso to from) = to
>
> That means you can narrow the problem down to understanding how `to` 
> works, which is defined in this case as:
>
>     to p = Pipes.Group.folds step id done (p ^. Pipes.Group.chunksOf 
> defaultChunkSize)
>  
>     step diffAs w8 = diffAs . (w8:)
>
>     done diffAs = BS.pack (diffAs [])
>
> This is the same thing as:
>
>     purely folds (fmap BS.pack list) (p ^. Pipes.Group.chunksOf 
> defaultChunkSize)
>
> In other words, it's splitting up the stream of `Word8`s into groups of 
> `defaultChunkSize`, folding each group into a list, and then `pack`ing that 
> list into a `ByteString`.
>
> To learn more about how `chunksOf` and `folds` work, you can read the 
> `pipes-group` tutorial here:
>
>
> http://hackage.haskell.org/package/pipes-group-1.0.0/docs/Pipes-Group-Tutorial.html
>
> You might also find this blog post useful, which is on a similar subject:
>
>
> http://www.haskellforall.com/2013/09/perfect-streaming-using-pipes-bytestring.html
>
> On 5/20/14, 11:22 AM, Justin Le wrote:
>  
>  Ah, that makes sense.
>
>  You're right, I was removing the wrong runEffect...and I see why it was 
> unnecessary :)
>
>  One last thing I am stuck on; what exactly does `view` do in this 
> context?  And how does it know when to properly chunk up ByteStrings, as to 
> keep constant space?
>
> On Saturday, May 10, 2014 7:40:52 AM UTC-7, Gabriel Gonzalez wrote: 
>>
>>  
>> On 05/09/2014 09:21 PM, Justin Le wrote:
>>  
>> Thanks Gabriel!  This was very helpful :) 
>>
>>  I've updated the repo with my modifications, and I had a couple of 
>> comments.
>>
>>  1. Deleting `runEffect` appears to bring about a type error, trying to 
>> unify the Proxy type with IO ().  Did I do something wrong here?
>>  
>>
>> Make sure you are deleting the correct `runEffect` (from encode.hs).  The 
>> result of `freqs` is not an `Effect`, so you don't need to run it.
>>
>>  2. Is there anywhere I can read up on Consumer' and (>~)?  I sort of 
>> have been avoided using them because I don't fully understand the 
>> differences between the (>->) category and the (>~) category, actually.  
>>  
>>
>> The English explanation of what `(>~)` does is that `p' >~ p` replaces 
>> every `await` in `p` with `p'`.  So, for example, when you write:
>>
>>     p' >~ cat
>>
>> That's equivalent to:
>>
>>     forever $ do
>>         a <- p'
>>         yield a
>>
>> In other words, `p'` gets reused in its entirety every time the 
>> downstream pipe `await`s.
>>
>> This behavior can also be specified more formally using these laws:
>>
>>     -- (p' >~) is a monad morphism
>>     p' >~ (do
>>         x <- m
>>         f x )
>>     = do
>>         x <- p' >~ m
>>         p' >~ f x
>>
>>     p' >~ return r = return 
>>
>>     -- `await` is the right-identity of `(>~)`
>>     p' >~ await = p'
>>
>>     -- The next two equations are free theorems
>>     p' >~ yield x = yield x
>>
>>     p' >~ lift m = lift m
>>
>> So, using the example of `p' >~ cat`, you can evaluate it like so:
>>
>>     p' >~ cat
>>
>>     -- Definition of `cat`
>>     = p' >~ (forever $ do
>>         a <- await
>>         yield a )
>>
>>     -- Monad morphisms distribute over `forever`
>>     = forever $ p' >~ (do
>>         a <- await
>>         yield a )
>>
>>     -- Monad morphism distributes over bind
>>     = forever $ do
>>         a <- p' >~ await
>>         p' >~ yield a
>>
>>     -- p' >~ await = p'
>>     = forever $ do
>>         a <- p'
>>         p' >~ yield a
>>
>>     -- p' >~ yield a = yield a
>>     = forever $ do
>>         a <- p'
>>         yield a
>>
>> It may also help to read the "Consumers" section of the `pipes` tutorial:
>>
>>
>> http://hackage.haskell.org/package/pipes-4.1.1/docs/Pipes-Tutorial.html#g:4
>>
>>
>>  Consumer' is just Consumer, but with the output not technically 
>> "closed" off for good (just effectively), right?  And how does (>~ cat) 
>> turn it into a Pipe?
>>
>>  Thank you again!
>>
>>  Justin
>>
>> On Friday, May 2, 2014 4:06:07 PM UTC-7, Gabriel Gonzalez wrote: 
>>>
>>>  This is the perfect kind of question to post to the mailing list!
>>>
>>> I will go down the two programs and make minor comments and then review 
>>> their overall structure.
>>>
>>> -- encode.hs
>>>
>>> * Delete `runEffect`.  It's not doing anything.  The reason that it 
>>> still type-checked was because your base monad was polymorphic over 
>>> `MonadIO`, so it let you accidentally insert an additional `Pipe` layer 
>>> (which was not doing anything).  As a side note, I think I made a mistake 
>>> by parametrizing the `Pipes.Prelude` utilities over `MonadIO` (I prefer 
>>> using `hoist` now), but I don't want to make a breaking change to fix it.
>>>
>>> * Good use of `withFile` instead of `pipes-safe`.  I feel like too many 
>>> people unnecessarily use `pipes-safe` when `withFile` suffices.
>>>
>>> * Use `view Pipes.ByteString.pack p` instead of `p >-> PP.map 
>>> B.singleton`.  It will group your Word8's into a more efficient chunk 
>>> size.  Your current formulation will call a separate write command for 
>>> every single byte, which is very inefficient.
>>>
>>> * For the reverse direction (i.e. `bytes`), you can either:
>>>
>>> A) Use `view (from Pipes.ByteString.pack)`, but that requires a `lens` 
>>> dependency (which I think is not good).  I plan to fix that by providing an 
>>> `unpack` lens in an upcoming `pipes-bytestring` release.  I created an 
>>> issue for this:
>>>
>>> https://github.com/Gabriel439/Haskell-Pipes-ByteString-Library/issues/36
>>>
>>> B) Use `mapFoldable`:
>>>
>>>     bytes = Pipes.Prelude.mapFoldable BS.unpack
>>>
>>> That's much more efficient.  The problem with your `bytes` function is 
>>> that it uses `foldl`, which triggers a bunch of left-associated binds, 
>>> generating quadratic time complexity in the number of bytes:
>>>
>>>     ((((return ()) >> yield byte1) >> yield byte2) >> yield byte3
>>>
>>> `mapFoldable`, on the other hand, is implemented in terms of `each`, 
>>> which uses a right-fold like this:
>>>
>>>     each = Data.Foldable.foldr (\a p -> yield a >> p) (return ())
>>>
>>> ... which triggers build/fold fusion and also gives linear time 
>>> complexity:
>>>
>>>     yield byte1 >> (yield byte2 >> (yield byte3 >> return ()))
>>>
>>> * If you're willing to skip the error message, you can shorten 
>>> `encodeByte` to:
>>>
>>>     encodeByte t = for cat $ \b -> each (b `M.lookup` t)
>>>
>>> ... which is the same thing as:
>>>
>>>     encodeByte t = Pipes.Prelude.mapFoldable (`M.lookup` t)
>>>
>>> * I should probably provide a function that transforms `Parser`s to 
>>> functions between `Producer`s to simplify your `dirsBytes` code.  I also 
>>> find myself writing that same pattern way too many times.  I just created 
>>> an issue to remind myself to do this:
>>>
>>> https://github.com/Gabriel439/Haskell-Pipes-Parse-Library/issues/28
>>>
>>> -- decode.hs
>>>
>>> * Is there any reason why you `drain` unused input using `limit` instead 
>>> of just using `take` by itself?
>>>
>>> * Same thing as `encode`.hs: try using `Pipes.ByteString.pack` and 
>>> `Pipes.Prelude.mapFoldable Data.ByteString.unpack` for much greater 
>>> efficiency translating between `Word8`s and `ByteString`s
>>>
>>> * You can make the code for `searchPT` more reusable by first defining a 
>>> `Consumer'` (note the prime!) that produces a single `Direction`, like this:
>>>
>>>      searchPT :: forall m. Monad m => PreTree Word8 -> Consumer' 
>>> Direction m Word8
>>>     searchPT pt0 = go pt0
>>>       where
>>>         go :: PreTree Word8 -> Consumer Direction m Word8
>>>         go (PTLeaf x      ) = return x
>>>         go (PTNode pt1 pt2) = do
>>>             dir <- await
>>>             go $ case dir of
>>>                 DLeft  -> pt1
>>>                 DRight -> pt2
>>>
>>> ... and then you can optionally upgrade that to a `Pipe` like this:
>>>
>>>     searchPT pt >~ cat :: Pipe Direction Word8 m r
>>>
>>> That decouples the logic for parsing one direction from the logic for 
>>> looping.
>>>
>>> * Also, there's nothing `Word8`-specific about your `searchPT` 
>>> function.  Consider generalizing the type to any value.
>>>
>>> * You can simplify the implementation of `dirs` using `mapFoldable`:
>>>
>>>     dirs = Pipes.Prelude.mapFoldable byteToDirs
>>>
>>> Overall the architecture of your program looks correct.  I don't see any 
>>> obvious non-idiomatic things that you are doing.
>>>
>>> On 5/2/14, 2:43 AM, Justin Le wrote:
>>>  
>>> Hi pipes people; 
>>>
>>>  I really don't know too much about pipes, but an entire section in a 
>>> project tutorial I am writing is going to be dedicated to hooking up all of 
>>> the pipes plumbing together.  Seeing as this might also be possibly used as 
>>> a pipes tutorial, I just wanted to make sure that my pipes code is 
>>> idiomatic/not awful/not going to set back your progress by generations. 
>>>  Does anyone mind maybe giving it a quick look over? :)  I would really 
>>> appreciate it, and credit will be given where deserved :)  I hope it is not 
>>> too imposing for me to ask!
>>>
>>>  It's actually a pair of programs --- a Huffman compression encoder and 
>>> decoder.
>>>
>>>  The encoder: 
>>> https://github.com/mstksg/inCode/blob/master/code-samples/huffman/encode.hs
>>> The decoer: 
>>> https://github.com/mstksg/inCode/blob/master/code-samples/huffman/decode.hs
>>>
>>>  I tried my best to abstract away the actual mechanisms of the huffman 
>>> logic where I could; it does peak in at some times, but the comments should 
>>> give you a general high-level idea of what each function is trying to do. 
>>>  For reference, the series itself explaining the logic is hosted at 
>>> http://blog.jle.im/entries/series/+huffman-compression
>>>
>>>  I am pretty sure that the code gives away my unfamiliarity :)
>>>
>>>  Thank you all!
>>> Justin
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Haskell Pipes" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>
>>>
>>>   -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Haskell Pipes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>>
>>
>>   -- 
> You received this message because you are subscribed to the Google Groups 
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
>
>
>  

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] Some code review for a pipes tutorial?

Reply via email to