Re: [haskell-pipes] Digesting streams: pipes-cryptohash

Nicolas Trangez Thu, 03 Jul 2014 02:56:36 -0700

On Wed, 2014-07-02 at 17:39 -0700, Gabriel Gonzalez wrote:
> This can be solved really elegantly by adding this function to 
> `Pipes.Prelude`:
> 
>      -- I need a better name for this
>      andFold :: Monad m => (x -> a -> x) -> x -> (x -> b) -> Producer a 
> m r -> Producer a m (r, b)


Yes please!

> 
> Then you can make your code much more reusable by writing it as a `Fold` 
> from `foldl`:
> 
>      hash :: HashAlgorithm a => Fold ByteString (Digest a)
> 
> Then you would apply `hash` to a `Producer` using:
> 
>      purely andFold hash :: HashAlgorithm a => Producer ByteString m r 
> -> Producer ByteSTring m (r, Digest a)
> 
> The benefit of factoring out the `hash` logic into a `Fold` is that:
> 
> * You can then use it to fold other containers, like lists or vectors:
> 
>      Control.Foldl.fold hash :: Foldable f => f ByteString -> Digest a
> 
> `conduit` also supports folding using the `Fold` type in 
> `conduit-extras` so your solution then generalizes to conduits, too.
> 
> * You can apply other folds at the same time using the `Applicative` 
> instance for `Fold`:
> 
>      purely andFold ((,) <$> hash <*> Control.Foldl.ByteString.length)
>          :: Producer ByteString m r -> Producer ByteString m (r, (Digest 
> a, Int64))
> 
> The above code computes the hash and the length in bytes simultaneously 
> in a single pass over the data
> 
> * People can use your `hash` `Fold` without a `pipes` dependency.  I 
> design all my libraries to minimize coupling like this.

I'm familiar with the `foldl` library (even created a sort-of port to
OCaml), and it's really nice! If the `andFold` would be in `pipes`, I'd
certainly change my code to use a `Fold` instead. Might even try to
convince to get such `Fold` in `cryptohash` upstream, so my package
becomes obsolete.

Nicolas

> 
> On 7/2/14, 3:59 PM, Nicolas Trangez wrote:
> > On Wed, 2014-07-02 at 15:31 -0700, Gabriel Gonzalez wrote:
> >> I think the interface I would recommend would be:
> >>
> >>       hash :: (HashAlgorithm a, Monad m) => Producer ByteString m r ->
> >> Producer ByteString m (Digest, r)
> >>
> >> I noticed your version has two `Producer` layers.  What's the role of
> >> the second `Producer`?
> > Good question :-) I expected to be able to write the function according
> > to the type you point out above, but when I wrote the thing, GHC
> > insisted the base `Producer` was required, and I tend to trust GHC when
> > it tells me such things.
> >
> > Now, since you made the same remark, and I tend to trust you as well
> > (;-)), I went back to the code and figured out all I need is a `lift`...
> >
> > So, first API change:
> >
> >      hash :: (HashAlgorithm a, Monad m)
> >           => Producer ByteString m r
> >           -> Producer ByteString m (r, Digest a)
> >
> > Much better!
> >
> > Now I'm wondering whether I'm overlooking any function in the Pipes
> > library which does what my internal function does:
> >
> >      hashInternal :: Monad m
> >                   => (a -> b -> a)
> >                   -> a
> >                   -> Producer b m r
> >                   -> Producer b m (r, a)
> >      hashInternal f = loop
> >        where
> >          loop !ctx p =
> >              lift (next p) >>=
> >              either
> >                  (\r -> return (r, ctx))
> >                  (\(b, p') -> yield b >> loop (f ctx b) p')
> >
> > This seems like a fairly reasonable/common thing to do, no (modulo the
> > strictness, could be pushed into `f`)?
> >
> > Thanks,
> >
> > Nicolas
> >
> >> On 7/2/14, 2:49 PM, Nicolas Trangez wrote:
> >>> All,
> >>>
> >>> Recently I was looking for a way to calculate the SHA256 hash of a
> >>> stream of ByteStrings, but not only calculate this hash: also pass on
> >>> the stream so it could be stored to a file (use-case: receiving data
> >>> from a socket, which needs to be saved to a file, named after a hash of
> >>> the content, then returning this 'name' to the client).
> >>>
> >>> I couldn't find any such library of-the-shelve in the Pipes ecosystem,
> >>> so I tried to implement something myself (with some help from IRC), and
> >>> just now pushed a first version of the code to GitHub.
> >>>
> >>> So, consider this a 'request for feedback/review' :-)
> >>>
> >>> The hashing itself is based on the 'cryptohash' library.
> >>>
> >>> There are 2 interfaces:
> >>> - One where a (strict or lazy) ByteString Producer can be wrapped,
> >>> resulting in a new Producer which will yield the exact same values as
> >>> the original one, but tupling the result of the original Producer with a
> >>> digest. E.g.
> >>>
> >>>       hash :: (HashAlgorithm a, Monad m)
> >>>            => Producer ByteString (Producer ByteString m) r
> >>>            -> Producer ByteString m (r, Digest a)
> >>>
> >>> - One which acts as a real 'Pipe' which passes along every (strict or
> >>> lazy) ByteString it receives, and updates a hashing Context kept in a
> >>> State layer of the monad stack. E.g.
> >>>
> >>>       hashPipe :: (HashAlgorithm a, MonadState (Context a) m)
> >>>                => Pipe ByteString ByteString m r
> >>>
> >>>
> >>> These have some variants: working over streams of strict or lazy
> >>> bytestrings, requiring type inference to determine the hash type, or an
> >>> explicit value, or allowing to pass in an existing context (e.g.
> >>> generated by another stream before).
> >>>
> >>> Code @ https://github.com/NicolasT/pipes-cryptohash/
> >>>
> >>> Please let me know if you think the API could be enhanced, I'm doing
> >>> something utterly wrong,...
> >>>
> >>> Thanks!
> >>>
> >>> Nicolas
> >>>
> >
> 


-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] Digesting streams: pipes-cryptohash

Reply via email to