Re: [haskell-pipes] What is the ideomatic way to combine pipes-binary, pipes-bytestring, pipes-parse?

Gabriel Gonzalez Tue, 20 May 2014 19:49:24 -0700

Returning the unused input on error is the idiomatic way for a lens tohandle errors. The parser won't keep going on after the error becausethe `Producer` containing any unused input is stashed inside the returnvalue of the outer `Producer`, so the unused input is totallyinaccessible to the `Parser`. The `Parser` type enforces this behavior:


    type Parser a m r = forall x . StateT (Producer a m x) m r

The `forall x` enforces in the types that the `Parser` cannot usewhatever is stored in the `x` in any meaningful way. Since the unusedinput is stored in that `x`, the `Parser` can't access it.


On 05/16/2014 02:31 AM, Torgeir Strand Henriksen wrote:

I can see that it would be more elegant to zoom rather than useStateT, but what options are there for error handling inside anencode/decode lens? Wrapping the Text and ByteString chunks in Eithersounds like a mess, and returning the unused bytes on error likedecodeIso8859_1 means the zoom has to be runStated in isolation toprevent the parser from keeping on after the error. Throwing anexception is possible of course, but would be nice to avoid.


kl. 19:18:51 UTC+2 tirsdag 13. mai 2014 skrev Gabriel Gonzalez følgende:

    It is perfectly acceptable to poke around in the underlying
    `StateT`. Generally, it is more idiomatic to encode your
    error-handling logic into the lens itself, but manual state
    passing is definitely an approved thing to do if you are more
    comfortable with it.  It really comes down to whatever is more
    readable for you.

    One of the reasons that I chose `StateT` as the substrate for
    `pipes-parse` rather than an opaque `Parser` type is that I wanted
    people to reuse their existing knowledge for how `StateT` works so
    that they could do things like what you are doing.

    On 5/13/14, 10:02 AM, Torgeir Strand Henriksen wrote:

    Great! I'm starting to get a firmer understandig of parsers. I
    ended up with this:

    decodeFilename = StateT $ \p -> do
        (fileName, p') <- runStateT drawAll . view (PB.span (/= 0) .
    to (PT.decodeAscii . (PB.map (`rotateR` 3) <-<)) . from
    PT.packChars) $ p
        Left p'' <-  next p'
        return (fileName, PB.drop 1 <-< join p'')

    entryParser tableStart = do
        fileName <- decodeFilename
        P.decodeGet $ (,,,) fileName <$> fmap (tableStart +) getInt32
    <*> getInt32 <*> getInt32

    Using next instead of drain, decode errors can be handled
    (pattern match failure for now). Because of drawAll, p'' (result
    of span) is empty when decode succeeds, so it can simply be
    joined, and then the terminating 0 dropped. Ignoring that the
    composition chains are a bit on the lengthy side, do you consider
    it "good style" to poke around in Parser's underlying StateT like
    that, or is it going against how the libraries are meant to be used?

    kl. 03:14:37 UTC+2 tirsdag 13. mai 2014 skrev Gabriel Gonzalez
    følgende:


        On 5/10/14, 7:59 AM, Torgeir Strand Henriksen wrote:

        Thanks for the reply! The rotated lens is no problem
        (rotateR is from Data.Bits), but i'm afraid the data won't
        decode as UTF-8. Just to make sure I understand correctly:
        When you talk about re-encoding unused values, do you mean
        the values that would be left if the parser zoomed into was
        a different one than drawAll and didn't consume all the data
        provided by the span lens?


        Yes, that's correct.  If you write:

            example = do
                a <- zoom someLens parser1
                parser2

        ... then `someLens` needs to know how to re-encode leftovers
        from `parser1` in the format that `parser2` understands.

        I understand why it would be a problem if those leftovers
        weren't propagated back, but I'm not sure I understand why
        that decision can't be made before the data is rotated and
        decoded as text. Does it have to do with the data being
        bytestrings that get transformed in blocks rather than per byte?


        Remember that the parser is totally oblivious about where the
        `Text` came from.  It doesn't know that the text originated
        from bytes or rotated data.  All it understands is "I am
        undrawing some text" and if you want it to undraw bytes then
        you need to translate the "undraw text" command to an "undraw
        bytes" command.  That's what the lens is doing.

        Note that you can still get a lens if you specify a way to
        handle errors.  Right now the `pipes-text` package provides a
        one-way decoding function for latin1 of type:

            decodeIso8859_1 :: Monad m => Producer ByteString m r ->
        Producer Text m (Producer ByteString m r)

        If you supplement that with a reverse function of type:

            encoder :: Monad m => Producer Text m (Producer
        ByteString m r) -> Producer ByteString m r

        ... then you can create a latin1 lens that you can pass to
        `zoom`:

            latin1 :: Monad m => Lens' (Producer ByteString m r)
        (Producer Text m (Producer ByteString m r))
            latin1 = iso decodeIso8859_1 encoder  -- I might have
        these arguments backwards; I didn't type-check this

        The reason that `pipes-text` doesn't already do this for you
        is because Latin1 does not specify how to encode multibyte
        characters.  In other words, you need to figure out how to
        convert these exotic characters to bytes, even if that means
        just discarding them (i.e. not undrawing the character at all).

        So if you really want to use latin1 as a lens, you definitely
        can!  It just requires that you decide you want to encode
        multibyte characters since there's no obvious right way to do
        that.  If you don't expect your input to have multibyte
        characters then you can just slightly modify
        `encodeIso8859_1` to do what you want:

            encoder pText = do
                pBytes <- encodeIso8859_1 pText
                runEffect (runEffect (pBytes >-> drain) >-> drain)

        That basically keeps decoding until it hits a character that
        `encodeIso8859_1` does not know how to encode, then gives up
        and and drains the rest of the stream.


        Anyway I'll have to go with your second option. Instead of
        breaking the parser into multiple code blocks (that have to
        be runStateTed individually) in order to get at the
        bytestring producer, is it reasonable to use get and put
        from Control.Monad.State? That way I can keep everything a
        single Parser, view the bytestring producer from "get"
        through the PB.span lens composed with the transformations,
        and "put" back the producer returned by span.

        Bonus question: If the rotated lens was simply Bits a => Int
        -> Lens' a a, could it be mapped/zoomed/something over a
        ByteString producer instead of including PB.map in the lens?
        That way rotated would be more reusable.

        On Saturday, May 10, 2014 1:45:32 AM UTC+2, Gabriel Gonzalez
        wrote:

            This works much better if you can make two small changes.

            First, I'm guessing that your `rotateR` function has
            some sort of inverse named `rotateL`.  If it does, then
            you can make a rotation lens:

                rotated :: Int -> Lens' (Producer ByteString m x)
            (Producer ByteString m x)
                rotated n = iso (PB.map (`rotateR` n)) (PB.map
            (`rotateL` n))

            Second, if you can use utf8 instead of latin1, then you
            can just write:

                decodeFileName :: Parser ByteString String
                decodeFileName = zoom (PB.span (/= 0) . rotated 3 .
            PT.utf8 . from PT.packChars) PP.drawAll

            The reason this works is that `rotated` and `utf8`
            contain extra information for how to propagate unused
            bytes back to the original input source.  In the case of
            `rotated` it reverse the original rotation and in the
            case of `utf8` it re-encodes them.

            If you don't have information for how to re-encode
            unused values, then you must apply the rotation and
            encoding to the producer before feeding it to the parser:

                yourProducer :: Producer ByteString IO ()

                runStateT PP.drawAll (yourProducer ^. span (/= 0) ^.
            to (PB.map (`rotateR` n)) ^. PT.utf8 ^. fromPT.packChars)
                    :: IO (String, Producer String IO (... {- more
            nested producers -}))

            `pipes-parse` doesn't let you merge logic into the
            parser unless you also include logic for how to
            propagate unused bytes to the input source.  Without
            that guarantee you get bugs related to silently dropping
            input values.

            On 5/9/14, 11:06 AM, Torgeir Strand Henriksen wrote:

            While working with a binary file format, I started out
            with this naive code:

            import qualified Pipes.Parse as P
            import qualified Pipes.Binary as P
            import qualified Pipes.ByteString as PB
            import qualified Data.Text as T
            import qualified Data.ByteString as BS

            entryParser tableStart = P.decodeGet $ (,,,) <$>
            decodeFilename <*> fmap (tableStart +) getWord32le <*>
            getWord32le <*> getWord32le

            decodeFilename = T.unpack . decodeLatin1 . BS.pack <$>
            go where
                go = do
                    c <- (`rotateR` 3) <$> getWord8
                    if c /= 0 then (c :) <$> go else pure [] --
            terminate on (and consume the) 0

            While it does work, I'm unhappy with decodeFilename as
            it basically implements a combination of map and
            span/fold with explicit recursion. But the underlying
            ByteString isn't available inside the Get monad without
            consuming it, so using e.g. BS.span seems out of the
            question. Let's see if lenses can come to the rescue:

            entryParser tableStart = do
                nameChunks <- zoom (PB.span (/= 0)) P.drawAll
                PB.drawByte -- draw the terminating 0
                let fileName = T.unpack . decodeLatin1 . BS.map
            (flip rotateR 3) . BS.concat $ nameChunks
                P.decodeGet $ (,,,) fileName <$> fmap (tableStart
            +) getWord32le <*> getWord32le <*> getWord32le

            I like this better - map and span aren't implemented
            manually anymore - but at the same time I was hoping
            for more. It doesn't seem right to work directly on
            ByteStrings (i.e. BS.map instead of PB.map, and text
            instead of pipes-text), and the combination of drawAll
            and concat is a bit awkward, especially since drawAll
            is only for testing (even though all the tutorials use
            it :) ). The latter point might be addressed by giving
            pipes-bytestring a folding function similar to
            P.foldAll, but even so I wonder if there's a more
            ideomatic way to do this?

--You received this message because you are subscribed to

            the Google Groups "Haskell Pipes" group.
            To unsubscribe from this group and stop receiving
            emails from it, send an email to
            [email protected].
            To post to this group, send email to
            [email protected].

--You received this message because you are subscribed to the

        Google Groups "Haskell Pipes" group.
        To unsubscribe from this group and stop receiving emails
        from it, send an email to [email protected].
        To post to this group, send email to [email protected].

--You received this message because you are subscribed to the

    Google Groups "Haskell Pipes" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected] <javascript:>.
    To post to this group, send email to [email protected]
    <javascript:>.

--

You received this message because you are subscribed to the GoogleGroups "Haskell Pipes" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.To post to this group, send email to [email protected]<mailto:[email protected]>.


--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] What is the ideomatic way to combine pipes-binary, pipes-bytestring, pipes-parse?

Reply via email to