Re: [haskell-pipes] Leftovers

Gabriel Gonzalez Sat, 05 Mar 2016 08:41:45 -0800

Yeah, leftovers handling is a little bit more subtle.  First, read this post if 
you haven’t already, particularly the section titled “Leftovers”:

http://www.haskellforall.com/2014/02/pipes-parse-30-lens-based-parsing.html 
<http://www.haskellforall.com/2014/02/pipes-parse-30-lens-based-parsing.html>

I can also add a more sophisticated example in addition to the one given in the 
above post.  Suppose that you have a pipe that encodes `Text` into `ByteString` 
named `encoder` that has this rough type:

    encoder :: Pipe Text ByteString ()

… which you can freely modify to add in any leftover functionality you want.

Now suppose we hook that up in this configuration:

    consumesBytes :: Pipe ByteString Out ()
    consumesText :: Pipe Text Out ()

    example :: Pipe Text Out ()
    example = do
        encoder >-> consumesBytes
        consumesText

Now think about what should happen if `consumesBytes` terminates while 
`encoder` is holding onto a non-empty queue of leftovers that haven’t been used 
up by `consumesBytes`

Well, first off, what type of leftovers would `encoder` be holding onto?  In 
this case the type of leftovers will be `ByteString`s that were returned to 
`encoder` by the `consumesBytes` `Pipe`.  There are two possible things that 
`encoder` could do with those leftovers upon termination:

* (A) Discard the leftovers.  However, that means that `consumesText` will 
begin at the wrong position in the stream
* (B) Transform the `ByteString` leftovers into `Text` leftovers and push those 
further upstream before terminating

Option (B) sounds reasonable at first except that there might not be a way to 
transform the `ByteString` leftovers into `Text` that can be pushed further 
upstream, for a couple of reasons:

* The encoding might not necessarily round-trip
* Even if the encoding *did* round-trip (like UTF8), there is nothing that 
requires that `consumesBytes` consumes bytes only along Unicode character 
boundaries

To elaborate on the latter case, assume that `encoder` received a text chunk 
containing a single character: "⌘”.  If you UTF8-encode that you get three 
bytes: "e2 8c 98”.  If `consumesBytes` only consumes the first byte (i.e. “e2”) 
then that means that `encoder` is now holding onto two byes in its leftovers 
queue, “8c 98”, and there’s no longer a way to push those two bytes further 
upstream as `Text` since they cannot be (correctly) re-encoded as `Text`.  Now, 
if `consumesBytes` terminates there is no legitimate way for `consumesText` to 
begin where `consumesBytes` left off.

> On Mar 4, 2016, at 10:44 AM, Tom Ellis 
> <tom-lists-haskell-pipes-2...@jaguarpaw.co.uk> wrote:
> 
> I never really grasped what leftovers are.  I don't understand why it
> wouldn't suffice to have a "pushback pipe"
> 
>    pushback :: Proxy a b (Either a b) b m r
> 
> that allows you to push "unused" 'b's back into it, to be stored in a queue. 
> The next 'b's then extracted from the pushback pipe will be the ones most
> recently pushed in.  If the queue is empty then we request a 'b' from the
> other end.
> 
> Does this make no sense?  Are leftovers much more subtle than I am
> realising?
> 
> Thanks,
> 
> Tom
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to haskell-pipes+unsubscr...@googlegroups.com.
> To post to this group, send email to haskell-pipes@googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to haskell-pipes+unsubscr...@googlegroups.com.
To post to this group, send email to haskell-pipes@googlegroups.com.

Re: [haskell-pipes] Leftovers

Reply via email to