Re: [haskell-pipes] What is the ideomatic way to combine pipes-binary, pipes-bytestring, pipes-parse?

Gabriel Gonzalez Fri, 23 May 2014 16:44:47 -0700

The simplest solution is to use `Pipes.ByteString.isEndOfBytes` afterthe `zoom` to check if it failed or not. If there are residual bytesthen the parse failed.

Another solution is to apply the lens on the `Producer` end, using`view`. This ensures that no information is lost.


On 5/22/14, 11:03 AM, Torgeir Strand Henriksen wrote:

Let me explain what I mean by the parser keeping on after the error:

parser :: Monad m => Parser ByteString m (String, Maybe Word8)
parser = do
    str <- zoom (PB.span (/= 0) . PT.utf8 . from PT.packChars) drawAll

a <- PB.drawByte -- for simplicity, it would be a more complicatedparser in actual code

    return (str, a)

test :: Monad m => [Word8] -> m ((String, Maybe Word8), ProducerP.ByteString m ())

test = runStateT parser . yield . BS.pack

\> fst <$> test [65,66,67,0]
("ABC",Just 0)

\> fst <$> test [65,255,66,67,0] -- invalid utf8
("A",Just 255)

As you can see, the parser function keeps going with PB.drawByte afterPT.utf8 fails. Unless I misunderstand, zoom even undraws the leftoversreturned by PT.utf8, so I don't see a way to detect the error andreport it to the user. Hopefully I'm missing something. :)


kl. 04:48:26 UTC+2 onsdag 21. mai 2014 skrev Gabriel Gonzalez følgende:

    Returning the unused input on error is the idiomatic way for a
    lens to handle errors.  The parser won't keep going on after the
    error because the `Producer` containing any unused input is
    stashed inside the return value of the outer `Producer`, so the
    unused input is totally inaccessible to the `Parser`. The `Parser`
    type enforces this behavior:

        type Parser a m r = forall x . StateT (Producer a m x) m r

    The `forall x` enforces in the types that the `Parser` cannot use
    whatever is stored in the `x` in any meaningful way.  Since the
    unused input is stored in that `x`, the `Parser` can't access it.

    On 05/16/2014 02:31 AM, Torgeir Strand Henriksen wrote:

    I can see that it would be more elegant to zoom rather than use
    StateT, but what options are there for error handling inside an
    encode/decode lens? Wrapping the Text and ByteString chunks in
    Either sounds like a mess, and returning the unused bytes on
    error like decodeIso8859_1 means the zoom has to be runStated in
    isolation to prevent the parser from keeping on after the error.
    Throwing an exception is possible of course, but would be nice to
    avoid.

    kl. 19:18:51 UTC+2 tirsdag 13. mai 2014 skrev Gabriel Gonzalez
    følgende:

        It is perfectly acceptable to poke around in the underlying
        `StateT`.  Generally, it is more idiomatic to encode your
        error-handling logic into the lens itself, but manual state
        passing is definitely an approved thing to do if you are more
        comfortable with it.  It really comes down to whatever is
        more readable for you.

        One of the reasons that I chose `StateT` as the substrate for
        `pipes-parse` rather than an opaque `Parser` type is that I
        wanted people to reuse their existing knowledge for how
        `StateT` works so that they could do things like what you are
        doing.

        On 5/13/14, 10:02 AM, Torgeir Strand Henriksen wrote:

        Great! I'm starting to get a firmer understandig of parsers.
        I ended up with this:

        decodeFilename = StateT $ \p -> do
            (fileName, p') <- runStateT drawAll . view (PB.span (/=
        0) . to (PT.decodeAscii . (PB.map (`rotateR` 3) <-<)) . from
        PT.packChars) $ p
            Left p'' <-  next p'
            return (fileName, PB.drop 1 <-< join p'')

        entryParser tableStart = do
            fileName <- decodeFilename
            P.decodeGet $ (,,,) fileName <$> fmap (tableStart +)
        getInt32 <*> getInt32 <*> getInt32

        Using next instead of drain, decode errors can be handled
        (pattern match failure for now). Because of drawAll, p''
        (result of span) is empty when decode succeeds, so it can
        simply be joined, and then the terminating 0 dropped.
        Ignoring that the composition chains are a bit on the
        lengthy side, do you consider it "good style" to poke around
        in Parser's underlying StateT like that, or is it going
        against how the libraries are meant to be used?

        kl. 03:14:37 UTC+2 tirsdag 13. mai 2014 skrev Gabriel
        Gonzalez følgende:


            On 5/10/14, 7:59 AM, Torgeir Strand Henriksen wrote:

            Thanks for the reply! The rotated lens is no problem
            (rotateR is from Data.Bits), but i'm afraid the data
            won't decode as UTF-8. Just to make sure I understand
            correctly: When you talk about re-encoding unused
            values, do you mean the values that would be left if
            the parser zoomed into was a different one than drawAll
            and didn't consume all the data provided by the span lens?


            Yes, that's correct.  If you write:

                example = do
                    a <- zoom someLens parser1
                    parser2

            ... then `someLens` needs to know how to re-encode
            leftovers from `parser1` in the format that `parser2`
            understands.

            I understand why it would be a problem if those
            leftovers weren't propagated back, but I'm not sure I
            understand why that decision can't be made before the
            data is rotated and decoded as text. Does it have to do
            with the data being bytestrings that get transformed in
            blocks rather than per byte?


            Remember that the parser is totally oblivious about
            where the `Text` came from. It doesn't know that the
            text originated from bytes or rotated data.  All it
            understands is "I am undrawing some text" and if you
            want it to undraw bytes then you need to translate the

"undraw text" command to an "undraw bytes" command.That's what the lens is doing.


            Note that you can still get a lens if you specify a way
            to handle errors.  Right now the `pipes-text` package
            provides a one-way decoding function for latin1 of type:

                decodeIso8859_1 :: Monad m => Producer ByteString m
            r -> Producer Text m (Producer ByteString m r)

            If you supplement that with a reverse function of type:

                encoder :: Monad m => Producer Text m (Producer
            ByteString m r) -> Producer ByteString m r

            ... then you can create a latin1 lens that you can pass
            to `zoom`:

                latin1 :: Monad m => Lens' (Producer ByteString m r)
            (Producer Text m (Producer ByteString m r))
                latin1 = iso decodeIso8859_1 encoder  -- I might
            have these arguments backwards; I didn't type-check this

            The reason that `pipes-text` doesn't already do this for
            you is because Latin1 does not specify how to encode
            multibyte characters. In other words, you need to figure
            out how to convert these exotic characters to bytes,
            even if that means just discarding them (i.e. not
            undrawing the character at all).

            So if you really want to use latin1 as a lens, you
            definitely can!  It just requires that you decide you
            want to encode multibyte characters since there's no
            obvious right way to do that.  If you don't expect your
            input to have multibyte characters then you can just
            slightly modify `encodeIso8859_1` to do what you want:

                encoder pText = do
                    pBytes <- encodeIso8859_1 pText
                    runEffect (runEffect (pBytes >-> drain) >-> drain)

            That basically keeps decoding until it hits a character
            that `encodeIso8859_1` does not know how to encode, then
            gives up and and drains the rest of the stream.


            Anyway I'll have to go with your second option. Instead
            of breaking the parser into multiple code blocks (that
            have to be runStateTed individually) in order to get at
            the bytestring producer, is it reasonable to use get
            and put from Control.Monad.State? That way I can keep
            everything a single Parser, view the bytestring
            producer from "get" through the PB.span lens composed
            with the transformations, and "put" back the producer
            returned by span.

            Bonus question: If the rotated lens was simply Bits a
            => Int -> Lens' a a, could it be
            mapped/zoomed/something over a ByteString producer
            instead of including PB.map in the lens? That way
            rotated would be more reusable.

            On Saturday, May 10, 2014 1:45:32 AM UTC+2, Gabriel
            Gonzalez wrote:

                This works much better if you can make two small
                changes.

                First, I'm guessing that your `rotateR` function
                has some sort of inverse named `rotateL`.  If it
                does, then you can make a rotation lens:

                    rotated :: Int -> Lens' (Producer ByteString m
                x) (Producer ByteString m x)
                    rotated n = iso (PB.map (`rotateR` n)) (PB.map
                (`rotateL` n))

                Second, if you can use utf8 instead of latin1, then
                you can just write:

                    decodeFileName :: Parser ByteString String
                    decodeFileName = zoom (PB.span (/= 0) . rotated
                3 . PT.utf8 . from PT.packChars) PP.drawAll

                The reason this works is that `rotated` and `utf8`
                contain extra information for how to propagate
                unused bytes back to the original input source.  In
                the case of `rotated` it reverse the original
                rotation and in the case of `utf8` it re-encodes them.

                If you don't have information for how to re-encode
                unused values, then you must apply the rotation and
                encoding to the producer before feeding it to the
                parser:

                    yourProducer :: Producer ByteString IO ()

                    runStateT PP.drawAll (yourProducer ^. span (/=
                0) ^. to (PB.map (`rotateR` n)) ^. PT.utf8 ^.
                fromPT.packChars)
                        :: IO (String, Producer String IO (... {-
                more nested producers -}))

                `pipes-parse` doesn't let you merge logic into the
                parser unless you also include logic for how to

propagate unused bytes to the input source.Without that guarantee you get bugs related to

                silently dropping input values.

                On 5/9/14, 11:06 AM, Torgeir Strand Henriksen wrote:

                While working with a binary file format, I started
                out with this naive code:

                import qualified Pipes.Parse as P
                import qualified Pipes.Binary as P
                import qualified Pipes.ByteString as PB
                import qualified Data.Text as T
                import qualified Data.ByteString as BS

                entryParser tableStart = P.decodeGet $ (,,,) <$>
                decodeFilename <*> fmap (tableStart +) getWord32le
                <*> getWord32le <*> getWord32le

                decodeFilename = T.unpack . decodeLatin1 . BS.pack
                <$> go where
                    go = do
                        c <- (`rotateR` 3) <$> getWord8
                        if c /= 0 then (c :) <$> go else pure []
                -- terminate on (and consume the) 0

                While it does work, I'm unhappy with
                decodeFilename as it basically implements a
                combination of map and span/fold with explicit
                recursion. But the underlying ByteString isn't
                available inside the Get monad without consuming
                it, so using e.g. BS.span seems out of the
                question. Let's see if lenses can come to the rescue:

                entryParser tableStart = do
                    nameChunks <- zoom (PB.span (/= 0)) P.drawAll
                    PB.drawByte -- draw the terminating 0
                    let fileName = T.unpack . decodeLatin1 .
                BS.map (flip rotateR 3) . BS.concat $ nameChunks
                    P.decodeGet $ (,,,) fileName <$> fmap
                (tableStart +) getWord32le <*> getWord32le <*>
                getWord32le

                I like this better - map and span aren't
                implemented manually anymore - but at the same
                time I was hoping for more. It doesn't seem right
                to work directly on ByteStrings (i.e. BS.map
                instead of PB.map, and text instead of
                pipes-text), and the combination of drawAll and
                concat is a bit awkward, especially since drawAll
                is only for testing (even though all the tutorials
                use it :) ). The latter point might be addressed
                by giving pipes-bytestring a folding function
                similar to P.foldAll, but even so I wonder if
                there's a more ideomatic way to do this?

--You received this message because you are

                subscribed to the Google Groups "Haskell Pipes" group.
                To unsubscribe from this group and stop receiving
                emails from it, send an email to
                [email protected].
                To post to this group, send email to
                [email protected].

--You received this message because you are subscribed to

            the Google Groups "Haskell Pipes" group.
            To unsubscribe from this group and stop receiving
            emails from it, send an email to
            [email protected].
            To post to this group, send email to
            [email protected].

--You received this message because you are subscribed to the

        Google Groups "Haskell Pipes" group.
        To unsubscribe from this group and stop receiving emails
        from it, send an email to [email protected].
        To post to this group, send email to [email protected].

--You received this message because you are subscribed to the

    Google Groups "Haskell Pipes" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected] <javascript:>.
    To post to this group, send email to [email protected]
    <javascript:>.

--

You received this message because you are subscribed to the GoogleGroups "Haskell Pipes" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.To post to this group, send email to [email protected]<mailto:[email protected]>.


--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] What is the ideomatic way to combine pipes-binary, pipes-bytestring, pipes-parse?

Reply via email to