zoom undraws the remaining data from both the the failing utf8 lens and the
span, so isEndOfBytes returns False for both valid and invalid UTF-8. I
guess the transparency of zoom makes it difficult to detect errors that
way. :) I'll stick to viewing the StateT's Producer for now.
On Saturday, May 24, 2014 1:44:23 AM UTC+2, Gabriel Gonzalez wrote:
>
> The simplest solution is to use `Pipes.ByteString.isEndOfBytes` after the
> `zoom` to check if it failed or not. If there are residual bytes then the
> parse failed.
>
> Another solution is to apply the lens on the `Producer` end, using
> `view`. This ensures that no information is lost.
>
> On 5/22/14, 11:03 AM, Torgeir Strand Henriksen wrote:
>
> Let me explain what I mean by the parser keeping on after the error:
>
> parser :: Monad m => Parser ByteString m (String, Maybe Word8)
> parser = do
> str <- zoom (PB.span (/= 0) . PT.utf8 . from PT.packChars) drawAll
> a <- PB.drawByte -- for simplicity, it would be a more complicated
> parser in actual code
> return (str, a)
>
> test :: Monad m => [Word8] -> m ((String, Maybe Word8), Producer
> P.ByteString m ())
> test = runStateT parser . yield . BS.pack
>
> \> fst <$> test [65,66,67,0]
> ("ABC",Just 0)
>
> \> fst <$> test [65,255,66,67,0] -- invalid utf8
> ("A",Just 255)
>
> As you can see, the parser function keeps going with PB.drawByte after
> PT.utf8 fails. Unless I misunderstand, zoom even undraws the leftovers
> returned by PT.utf8, so I don't see a way to detect the error and report it
> to the user. Hopefully I'm missing something. :)
>
> kl. 04:48:26 UTC+2 onsdag 21. mai 2014 skrev Gabriel Gonzalez følgende:
>>
>> Returning the unused input on error is the idiomatic way for a lens to
>> handle errors. The parser won't keep going on after the error because the
>> `Producer` containing any unused input is stashed inside the return value
>> of the outer `Producer`, so the unused input is totally inaccessible to the
>> `Parser`. The `Parser` type enforces this behavior:
>>
>> type Parser a m r = forall x . StateT (Producer a m x) m r
>>
>> The `forall x` enforces in the types that the `Parser` cannot use
>> whatever is stored in the `x` in any meaningful way. Since the unused
>> input is stored in that `x`, the `Parser` can't access it.
>>
>> On 05/16/2014 02:31 AM, Torgeir Strand Henriksen wrote:
>>
>> I can see that it would be more elegant to zoom rather than use StateT,
>> but what options are there for error handling inside an encode/decode lens?
>> Wrapping the Text and ByteString chunks in Either sounds like a mess, and
>> returning the unused bytes on error like decodeIso8859_1 means the zoom has
>> to be runStated in isolation to prevent the parser from keeping on after
>> the error. Throwing an exception is possible of course, but would be nice
>> to avoid.
>>
>> kl. 19:18:51 UTC+2 tirsdag 13. mai 2014 skrev Gabriel Gonzalez følgende:
>>>
>>> It is perfectly acceptable to poke around in the underlying `StateT`.
>>> Generally, it is more idiomatic to encode your error-handling logic into
>>> the lens itself, but manual state passing is definitely an approved thing
>>> to do if you are more comfortable with it. It really comes down to
>>> whatever is more readable for you.
>>>
>>> One of the reasons that I chose `StateT` as the substrate for
>>> `pipes-parse` rather than an opaque `Parser` type is that I wanted people
>>> to reuse their existing knowledge for how `StateT` works so that they could
>>> do things like what you are doing.
>>>
>>> On 5/13/14, 10:02 AM, Torgeir Strand Henriksen wrote:
>>>
>>> Great! I'm starting to get a firmer understandig of parsers. I ended up
>>> with this:
>>>
>>> decodeFilename = StateT $ \p -> do
>>> (fileName, p') <- runStateT drawAll . view (PB.span (/= 0) . to
>>> (PT.decodeAscii . (PB.map (`rotateR` 3) <-<)) . from PT.packChars) $ p
>>> Left p'' <- next p'
>>> return (fileName, PB.drop 1 <-< join p'')
>>>
>>> entryParser tableStart = do
>>> fileName <- decodeFilename
>>> P.decodeGet $ (,,,) fileName <$> fmap (tableStart +) getInt32 <*>
>>> getInt32 <*> getInt32
>>>
>>> Using next instead of drain, decode errors can be handled (pattern match
>>> failure for now). Because of drawAll, p'' (result of span) is empty when
>>> decode succeeds, so it can simply be joined, and then the terminating 0
>>> dropped. Ignoring that the composition chains are a bit on the lengthy
>>> side, do you consider it "good style" to poke around in Parser's underlying
>>> StateT like that, or is it going against how the libraries are meant to be
>>> used?
>>>
>>> kl. 03:14:37 UTC+2 tirsdag 13. mai 2014 skrev Gabriel Gonzalez følgende:
>>>>
>>>>
>>>> On 5/10/14, 7:59 AM, Torgeir Strand Henriksen wrote:
>>>>
>>>> Thanks for the reply! The rotated lens is no problem (rotateR is from
>>>> Data.Bits), but i'm afraid the data won't decode as UTF-8. Just to make
>>>> sure I understand correctly: When you talk about re-encoding unused
>>>> values,
>>>> do you mean the values that would be left if the parser zoomed into was a
>>>> different one than drawAll and didn't consume all the data provided by the
>>>> span lens?
>>>>
>>>>
>>>> Yes, that's correct. If you write:
>>>>
>>>> example = do
>>>> a <- zoom someLens parser1
>>>> parser2
>>>>
>>>> ... then `someLens` needs to know how to re-encode leftovers from
>>>> `parser1` in the format that `parser2` understands.
>>>>
>>>> I understand why it would be a problem if those leftovers weren't
>>>> propagated back, but I'm not sure I understand why that decision can't be
>>>> made before the data is rotated and decoded as text. Does it have to do
>>>> with the data being bytestrings that get transformed in blocks rather than
>>>> per byte?
>>>>
>>>>
>>>> Remember that the parser is totally oblivious about where the `Text`
>>>> came from. It doesn't know that the text originated from bytes or rotated
>>>> data. All it understands is "I am undrawing some text" and if you want it
>>>> to undraw bytes then you need to translate the "undraw text" command to an
>>>> "undraw bytes" command. That's what the lens is doing.
>>>>
>>>> Note that you can still get a lens if you specify a way to handle
>>>> errors. Right now the `pipes-text` package provides a one-way decoding
>>>> function for latin1 of type:
>>>>
>>>> decodeIso8859_1 :: Monad m => Producer ByteString m r -> Producer
>>>> Text m (Producer ByteString m r)
>>>>
>>>> If you supplement that with a reverse function of type:
>>>>
>>>> encoder :: Monad m => Producer Text m (Producer ByteString m r) ->
>>>> Producer ByteString m r
>>>>
>>>> ... then you can create a latin1 lens that you can pass to `zoom`:
>>>>
>>>> latin1 :: Monad m => Lens' (Producer ByteString m r) (Producer Text
>>>> m (Producer ByteString m r))
>>>> latin1 = iso decodeIso8859_1 encoder -- I might have these
>>>> arguments backwards; I didn't type-check this
>>>>
>>>> The reason that `pipes-text` doesn't already do this for you is because
>>>> Latin1 does not specify how to encode multibyte characters. In other
>>>> words, you need to figure out how to convert these exotic characters to
>>>> bytes, even if that means just discarding them (i.e. not undrawing the
>>>> character at all).
>>>>
>>>> So if you really want to use latin1 as a lens, you definitely can! It
>>>> just requires that you decide you want to encode multibyte characters
>>>> since
>>>> there's no obvious right way to do that. If you don't expect your input
>>>> to
>>>> have multibyte characters then you can just slightly modify
>>>> `encodeIso8859_1` to do what you want:
>>>>
>>>> encoder pText = do
>>>> pBytes <- encodeIso8859_1 pText
>>>> runEffect (runEffect (pBytes >-> drain) >-> drain)
>>>>
>>>> That basically keeps decoding until it hits a character that
>>>> `encodeIso8859_1` does not know how to encode, then gives up and and
>>>> drains
>>>> the rest of the stream.
>>>>
>>>>
>>>>
>>>> Anyway I'll have to go with your second option. Instead of breaking the
>>>> parser into multiple code blocks (that have to be runStateTed
>>>> individually)
>>>> in order to get at the bytestring producer, is it reasonable to use get
>>>> and
>>>> put from Control.Monad.State? That way I can keep everything a single
>>>> Parser, view the bytestring producer from "get" through the PB.span lens
>>>> composed with the transformations, and "put" back the producer returned by
>>>> span.
>>>>
>>>> Bonus question: If the rotated lens was simply Bits a => Int -> Lens' a
>>>> a, could it be mapped/zoomed/something over a ByteString producer instead
>>>> of including PB.map in the lens? That way rotated would be more reusable.
>>>>
>>>> On Saturday, May 10, 2014 1:45:32 AM UTC+2, Gabriel Gonzalez wrote:
>>>>>
>>>>> This works much better if you can make two small changes.
>>>>>
>>>>> First, I'm guessing that your `rotateR` function has some sort of
>>>>> inverse named `rotateL`. If it does, then you can make a rotation lens:
>>>>>
>>>>> rotated :: Int -> Lens' (Producer ByteString m x) (Producer
>>>>> ByteString m x)
>>>>> rotated n = iso (PB.map (`rotateR` n)) (PB.map (`rotateL` n))
>>>>>
>>>>> Second, if you can use utf8 instead of latin1, then you can just write:
>>>>>
>>>>> decodeFileName :: Parser ByteString String
>>>>> decodeFileName = zoom (PB.span (/= 0) . rotated 3 . PT.utf8 . from
>>>>> PT.packChars) PP.drawAll
>>>>>
>>>>> The reason this works is that `rotated` and `utf8` contain extra
>>>>> information for how to propagate unused bytes back to the original input
>>>>> source. In the case of `rotated` it reverse the original rotation and in
>>>>> the case of `utf8` it re-encodes them.
>>>>>
>>>>> If you don't have information for how to re-encode unused values, then
>>>>> you must apply the rotation and encoding to the producer before feeding
>>>>> it
>>>>> to the parser:
>>>>>
>>>>> yourProducer :: Producer ByteString IO ()
>>>>>
>>>>> runStateT PP.drawAll (yourProducer ^. span (/= 0) ^. to (PB.map
>>>>> (`rotateR` n)) ^. PT.utf8 ^. fromPT.packChars)
>>>>> :: IO (String, Producer String IO (... {- more nested
>>>>> producers -}))
>>>>>
>>>>> `pipes-parse` doesn't let you merge logic into the parser unless you
>>>>> also include logic for how to propagate unused bytes to the input source.
>>>>>
>>>>> Without that guarantee you get bugs related to silently dropping input
>>>>> values.
>>>>>
>>>>> On 5/9/14, 11:06 AM, Torgeir Strand Henriksen wrote:
>>>>>
>>>>> While working with a binary file format, I started out with this naive
>>>>> code:
>>>>>
>>>>> import qualified Pipes.Parse as P
>>>>> import qualified Pipes.Binary as P
>>>>> import qualified Pipes.ByteString as PB
>>>>> import qualified Data.Text as T
>>>>> import qualified Data.ByteString as BS
>>>>>
>>>>> entryParser tableStart = P.decodeGet $ (,,,) <$> decodeFilename <*>
>>>>> fmap (tableStart +) getWord32le <*> getWord32le <*> getWord32le
>>>>>
>>>>> decodeFilename = T.unpack . decodeLatin1 . BS.pack <$> go where
>>>>> go = do
>>>>> c <- (`rotateR` 3) <$> getWord8
>>>>> if c /= 0 then (c :) <$> go else pure [] -- terminate on (and
>>>>> consume the) 0
>>>>>
>>>>> While it does work, I'm unhappy with decodeFilename as it basically
>>>>> implements a combination of map and span/fold with explicit recursion.
>>>>> But
>>>>> the underlying ByteString isn't available inside the Get monad without
>>>>> consuming it, so using e.g. BS.span seems out of the question. Let's see
>>>>> if
>>>>> lenses can come to the rescue:
>>>>>
>>>>> entryParser tableStart = do
>>>>> nameChunks <- zoom (PB.span (/= 0)) P.drawAll
>>>>> PB.drawByte -- draw the terminating 0
>>>>> let fileName = T.unpack . decodeLatin1 . BS.map (flip rotateR 3) .
>>>>> BS.concat $ nameChunks
>>>>> P.decodeGet $ (,,,) fileName <$> fmap (tableStart +) getWord32le
>>>>> <*> getWord32le <*> getWord32le
>>>>>
>>>>> I like this better - map and span aren't implemented manually anymore
>>>>> - but at the same time I was hoping for more. It doesn't seem right to
>>>>> work
>>>>> directly on ByteStrings (i.e. BS.map instead of PB.map, and text instead
>>>>> of
>>>>> pipes-text), and the combination of drawAll and concat is a bit awkward,
>>>>> especially since drawAll is only for testing (even though all the
>>>>> tutorials
>>>>> use it :) ). The latter point might be addressed by giving
>>>>> pipes-bytestring
>>>>> a folding function similar to P.foldAll, but even so I wonder if there's
>>>>> a
>>>>> more ideomatic way to do this?
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Haskell Pipes" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>>
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Haskell Pipes" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Haskell Pipes" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Haskell Pipes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected]<javascript:>
> .
>
>
>
--
You received this message because you are subscribed to the Google Groups
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].