Re: [haskell-pipes] What is the ideomatic way to combine pipes-binary, pipes-bytestring, pipes-parse?

Gabriel Gonzalez Mon, 12 May 2014 18:15:07 -0700


On 5/10/14, 7:59 AM, Torgeir Strand Henriksen wrote:

Thanks for the reply! The rotated lens is no problem (rotateR is fromData.Bits), but i'm afraid the data won't decode as UTF-8. Just tomake sure I understand correctly: When you talk about re-encodingunused values, do you mean the values that would be left if the parserzoomed into was a different one than drawAll and didn't consume allthe data provided by the span lens?


Yes, that's correct.  If you write:

    example = do
        a <- zoom someLens parser1
        parser2

... then `someLens` needs to know how to re-encode leftovers from`parser1` in the format that `parser2` understands.

I understand why it would be a problem if those leftovers weren'tpropagated back, but I'm not sure I understand why that decision can'tbe made before the data is rotated and decoded as text. Does it haveto do with the data being bytestrings that get transformed in blocksrather than per byte?

Remember that the parser is totally oblivious about where the `Text`came from. It doesn't know that the text originated from bytes orrotated data. All it understands is "I am undrawing some text" and ifyou want it to undraw bytes then you need to translate the "undraw text"command to an "undraw bytes" command. That's what the lens is doing.

Note that you can still get a lens if you specify a way to handleerrors. Right now the `pipes-text` package provides a one-way decodingfunction for latin1 of type:

decodeIso8859_1 :: Monad m => Producer ByteString m r -> ProducerText m (Producer ByteString m r)


If you supplement that with a reverse function of type:

encoder :: Monad m => Producer Text m (Producer ByteString m r) ->Producer ByteString m r


... then you can create a latin1 lens that you can pass to `zoom`:

latin1 :: Monad m => Lens' (Producer ByteString m r) (Producer Textm (Producer ByteString m r))latin1 = iso decodeIso8859_1 encoder -- I might have thesearguments backwards; I didn't type-check this

The reason that `pipes-text` doesn't already do this for you is becauseLatin1 does not specify how to encode multibyte characters. In otherwords, you need to figure out how to convert these exotic characters tobytes, even if that means just discarding them (i.e. not undrawing thecharacter at all).

So if you really want to use latin1 as a lens, you definitely can! Itjust requires that you decide you want to encode multibyte characterssince there's no obvious right way to do that. If you don't expect yourinput to have multibyte characters then you can just slightly modify`encodeIso8859_1` to do what you want:


    encoder pText = do
        pBytes <- encodeIso8859_1 pText
        runEffect (runEffect (pBytes >-> drain) >-> drain)

That basically keeps decoding until it hits a character that`encodeIso8859_1` does not know how to encode, then gives up and anddrains the rest of the stream.

Anyway I'll have to go with your second option. Instead of breakingthe parser into multiple code blocks (that have to be runStateTedindividually) in order to get at the bytestring producer, is itreasonable to use get and put from Control.Monad.State? That way I cankeep everything a single Parser, view the bytestring producer from"get" through the PB.span lens composed with the transformations, and"put" back the producer returned by span.

Bonus question: If the rotated lens was simply Bits a => Int -> Lens'a a, could it be mapped/zoomed/something over a ByteString producerinstead of including PB.map in the lens? That way rotated would bemore reusable.


On Saturday, May 10, 2014 1:45:32 AM UTC+2, Gabriel Gonzalez wrote:

    This works much better if you can make two small changes.

    First, I'm guessing that your `rotateR` function has some sort of
    inverse named `rotateL`.  If it does, then you can make a rotation
    lens:

        rotated :: Int -> Lens' (Producer ByteString m x) (Producer
    ByteString m x)
        rotated n = iso (PB.map (`rotateR` n)) (PB.map (`rotateL` n))

    Second, if you can use utf8 instead of latin1, then you can just
    write:

        decodeFileName :: Parser ByteString String
        decodeFileName = zoom (PB.span (/= 0) . rotated 3 . PT.utf8 .
    from PT.packChars) PP.drawAll

    The reason this works is that `rotated` and `utf8` contain extra
    information for how to propagate unused bytes back to the original
    input source.  In the case of `rotated` it reverse the original
    rotation and in the case of `utf8` it re-encodes them.

    If you don't have information for how to re-encode unused values,
    then you must apply the rotation and encoding to the producer
    before feeding it to the parser:

        yourProducer :: Producer ByteString IO ()

        runStateT PP.drawAll (yourProducer ^. span (/= 0) ^. to
    (PB.map (`rotateR` n)) ^. PT.utf8 ^. fromPT.packChars)
            :: IO (String, Producer String IO (... {- more nested
    producers -}))

    `pipes-parse` doesn't let you merge logic into the parser unless
    you also include logic for how to propagate unused bytes to the
    input source.  Without that guarantee you get bugs related to
    silently dropping input values.

    On 5/9/14, 11:06 AM, Torgeir Strand Henriksen wrote:

    While working with a binary file format, I started out with this
    naive code:

    import qualified Pipes.Parse as P
    import qualified Pipes.Binary as P
    import qualified Pipes.ByteString as PB
    import qualified Data.Text as T
    import qualified Data.ByteString as BS

    entryParser tableStart = P.decodeGet $ (,,,) <$> decodeFilename
    <*> fmap (tableStart +) getWord32le <*> getWord32le <*> getWord32le

    decodeFilename = T.unpack . decodeLatin1 . BS.pack <$> go where
        go = do
            c <- (`rotateR` 3) <$> getWord8
            if c /= 0 then (c :) <$> go else pure [] -- terminate on
    (and consume the) 0

    While it does work, I'm unhappy with decodeFilename as it
    basically implements a combination of map and span/fold with
    explicit recursion. But the underlying ByteString isn't available
    inside the Get monad without consuming it, so using e.g. BS.span
    seems out of the question. Let's see if lenses can come to the
    rescue:

    entryParser tableStart = do
        nameChunks <- zoom (PB.span (/= 0)) P.drawAll
        PB.drawByte -- draw the terminating 0
        let fileName = T.unpack . decodeLatin1 . BS.map (flip rotateR
    3) . BS.concat $ nameChunks
        P.decodeGet $ (,,,) fileName <$> fmap (tableStart +)
    getWord32le <*> getWord32le <*> getWord32le

    I like this better - map and span aren't implemented manually
    anymore - but at the same time I was hoping for more. It doesn't
    seem right to work directly on ByteStrings (i.e. BS.map instead
    of PB.map, and text instead of pipes-text), and the combination
    of drawAll and concat is a bit awkward, especially since drawAll
    is only for testing (even though all the tutorials use it :) ).
    The latter point might be addressed by giving pipes-bytestring a
    folding function similar to P.foldAll, but even so I wonder if
    there's a more ideomatic way to do this?

--You received this message because you are subscribed to the

    Google Groups "Haskell Pipes" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected] <javascript:>.
    To post to this group, send email to [email protected]
    <javascript:>.

--

You received this message because you are subscribed to the GoogleGroups "Haskell Pipes" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.To post to this group, send email to [email protected]<mailto:[email protected]>.


--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] What is the ideomatic way to combine pipes-binary, pipes-bytestring, pipes-parse?

Reply via email to