Re: [haskell-pipes] What is the ideomatic way to combine pipes-binary, pipes-bytestring, pipes-parse?

Gabriel Gonzalez Tue, 13 May 2014 10:19:44 -0700

It is perfectly acceptable to poke around in the underlying `StateT`.Generally, it is more idiomatic to encode your error-handling logic intothe lens itself, but manual state passing is definitely an approvedthing to do if you are more comfortable with it. It really comes downto whatever is more readable for you.

One of the reasons that I chose `StateT` as the substrate for`pipes-parse` rather than an opaque `Parser` type is that I wantedpeople to reuse their existing knowledge for how `StateT` works so thatthey could do things like what you are doing.


On 5/13/14, 10:02 AM, Torgeir Strand Henriksen wrote:

Great! I'm starting to get a firmer understandig of parsers. I endedup with this:


decodeFilename = StateT $ \p -> do

(fileName, p') <- runStateT drawAll . view (PB.span (/= 0) . to(PT.decodeAscii . (PB.map (`rotateR` 3) <-<)) . from PT.packChars) $ p

    Left p'' <-  next p'
    return (fileName, PB.drop 1 <-< join p'')

entryParser tableStart = do
    fileName <- decodeFilename

P.decodeGet $ (,,,) fileName <$> fmap (tableStart +) getInt32 <*>getInt32 <*> getInt32

Using next instead of drain, decode errors can be handled (patternmatch failure for now). Because of drawAll, p'' (result of span) isempty when decode succeeds, so it can simply be joined, and then theterminating 0 dropped. Ignoring that the composition chains are a biton the lengthy side, do you consider it "good style" to poke around inParser's underlying StateT like that, or is it going against how thelibraries are meant to be used?


kl. 03:14:37 UTC+2 tirsdag 13. mai 2014 skrev Gabriel Gonzalez følgende:


    On 5/10/14, 7:59 AM, Torgeir Strand Henriksen wrote:

    Thanks for the reply! The rotated lens is no problem (rotateR is
    from Data.Bits), but i'm afraid the data won't decode as UTF-8.
    Just to make sure I understand correctly: When you talk about
    re-encoding unused values, do you mean the values that would be
    left if the parser zoomed into was a different one than drawAll
    and didn't consume all the data provided by the span lens?


    Yes, that's correct.  If you write:

        example = do
            a <- zoom someLens parser1
            parser2

    ... then `someLens` needs to know how to re-encode leftovers from
    `parser1` in the format that `parser2` understands.

    I understand why it would be a problem if those leftovers weren't
    propagated back, but I'm not sure I understand why that decision
    can't be made before the data is rotated and decoded as text.
    Does it have to do with the data being bytestrings that get
    transformed in blocks rather than per byte?


    Remember that the parser is totally oblivious about where the
    `Text` came from.  It doesn't know that the text originated from
    bytes or rotated data.  All it understands is "I am undrawing some
    text" and if you want it to undraw bytes then you need to

translate the "undraw text" command to an "undraw bytes" command.That's what the lens is doing.


    Note that you can still get a lens if you specify a way to handle
    errors.  Right now the `pipes-text` package provides a one-way
    decoding function for latin1 of type:

        decodeIso8859_1 :: Monad m => Producer ByteString m r ->
    Producer Text m (Producer ByteString m r)

    If you supplement that with a reverse function of type:

        encoder :: Monad m => Producer Text m (Producer ByteString m
    r) -> Producer ByteString m r

    ... then you can create a latin1 lens that you can pass to `zoom`:

        latin1 :: Monad m => Lens' (Producer ByteString m r) (Producer
    Text m (Producer ByteString m r))
        latin1 = iso decodeIso8859_1 encoder  -- I might have these
    arguments backwards; I didn't type-check this

    The reason that `pipes-text` doesn't already do this for you is
    because Latin1 does not specify how to encode multibyte
    characters.  In other words, you need to figure out how to convert
    these exotic characters to bytes, even if that means just
    discarding them (i.e. not undrawing the character at all).

    So if you really want to use latin1 as a lens, you definitely
    can!  It just requires that you decide you want to encode
    multibyte characters since there's no obvious right way to do
    that.  If you don't expect your input to have multibyte characters
    then you can just slightly modify `encodeIso8859_1` to do what you
    want:

        encoder pText = do
            pBytes <- encodeIso8859_1 pText
            runEffect (runEffect (pBytes >-> drain) >-> drain)

    That basically keeps decoding until it hits a character that
    `encodeIso8859_1` does not know how to encode, then gives up and
    and drains the rest of the stream.


    Anyway I'll have to go with your second option. Instead of
    breaking the parser into multiple code blocks (that have to be
    runStateTed individually) in order to get at the bytestring
    producer, is it reasonable to use get and put from
    Control.Monad.State? That way I can keep everything a single
    Parser, view the bytestring producer from "get" through the
    PB.span lens composed with the transformations, and "put" back
    the producer returned by span.

    Bonus question: If the rotated lens was simply Bits a => Int ->
    Lens' a a, could it be mapped/zoomed/something over a ByteString
    producer instead of including PB.map in the lens? That way
    rotated would be more reusable.

    On Saturday, May 10, 2014 1:45:32 AM UTC+2, Gabriel Gonzalez wrote:

        This works much better if you can make two small changes.

        First, I'm guessing that your `rotateR` function has some
        sort of inverse named `rotateL`.  If it does, then you can
        make a rotation lens:

            rotated :: Int -> Lens' (Producer ByteString m x)
        (Producer ByteString m x)
            rotated n = iso (PB.map (`rotateR` n)) (PB.map (`rotateL` n))

        Second, if you can use utf8 instead of latin1, then you can
        just write:

            decodeFileName :: Parser ByteString String
            decodeFileName = zoom (PB.span (/= 0) . rotated 3 .
        PT.utf8 . from PT.packChars) PP.drawAll

        The reason this works is that `rotated` and `utf8` contain
        extra information for how to propagate unused bytes back to
        the original input source.  In the case of `rotated` it
        reverse the original rotation and in the case of `utf8` it
        re-encodes them.

        If you don't have information for how to re-encode unused
        values, then you must apply the rotation and encoding to the
        producer before feeding it to the parser:

            yourProducer :: Producer ByteString IO ()

            runStateT PP.drawAll (yourProducer ^. span (/= 0) ^. to
        (PB.map (`rotateR` n)) ^. PT.utf8 ^. fromPT.packChars)
                :: IO (String, Producer String IO (... {- more nested
        producers -}))

        `pipes-parse` doesn't let you merge logic into the parser
        unless you also include logic for how to propagate unused
        bytes to the input source.  Without that guarantee you get
        bugs related to silently dropping input values.

        On 5/9/14, 11:06 AM, Torgeir Strand Henriksen wrote:

        While working with a binary file format, I started out with
        this naive code:

        import qualified Pipes.Parse as P
        import qualified Pipes.Binary as P
        import qualified Pipes.ByteString as PB
        import qualified Data.Text as T
        import qualified Data.ByteString as BS

        entryParser tableStart = P.decodeGet $ (,,,) <$>
        decodeFilename <*> fmap (tableStart +) getWord32le <*>
        getWord32le <*> getWord32le

        decodeFilename = T.unpack . decodeLatin1 . BS.pack <$> go where
            go = do
                c <- (`rotateR` 3) <$> getWord8
                if c /= 0 then (c :) <$> go else pure [] --
        terminate on (and consume the) 0

        While it does work, I'm unhappy with decodeFilename as it
        basically implements a combination of map and span/fold with
        explicit recursion. But the underlying ByteString isn't
        available inside the Get monad without consuming it, so
        using e.g. BS.span seems out of the question. Let's see if
        lenses can come to the rescue:

        entryParser tableStart = do
            nameChunks <- zoom (PB.span (/= 0)) P.drawAll
            PB.drawByte -- draw the terminating 0
            let fileName = T.unpack . decodeLatin1 . BS.map (flip
        rotateR 3) . BS.concat $ nameChunks
            P.decodeGet $ (,,,) fileName <$> fmap (tableStart +)
        getWord32le <*> getWord32le <*> getWord32le

        I like this better - map and span aren't implemented
        manually anymore - but at the same time I was hoping for
        more. It doesn't seem right to work directly on ByteStrings
        (i.e. BS.map instead of PB.map, and text instead of
        pipes-text), and the combination of drawAll and concat is a
        bit awkward, especially since drawAll is only for testing
        (even though all the tutorials use it :) ). The latter point
        might be addressed by giving pipes-bytestring a folding
        function similar to P.foldAll, but even so I wonder if
        there's a more ideomatic way to do this?

--You received this message because you are subscribed to the

        Google Groups "Haskell Pipes" group.
        To unsubscribe from this group and stop receiving emails
        from it, send an email to [email protected].
        To post to this group, send email to [email protected].

--You received this message because you are subscribed to the

    Google Groups "Haskell Pipes" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected] <javascript:>.
    To post to this group, send email to [email protected]
    <javascript:>.

--

You received this message because you are subscribed to the GoogleGroups "Haskell Pipes" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.To post to this group, send email to [email protected]<mailto:[email protected]>.


--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] What is the ideomatic way to combine pipes-binary, pipes-bytestring, pipes-parse?

Reply via email to