Let me explain what I mean by the parser keeping on after the error:
parser :: Monad m => Parser ByteString m (String, Maybe Word8)
parser = do
str <- zoom (PB.span (/= 0) . PT.utf8 . from PT.packChars) drawAll
a <- PB.drawByte -- for simplicity, it would be a more complicated
parser in actual code
return (str, a)
test :: Monad m => [Word8] -> m ((String, Maybe Word8), Producer
P.ByteString m ())
test = runStateT parser . yield . BS.pack
\> fst <$> test [65,66,67,0]
("ABC",Just 0)
\> fst <$> test [65,255,66,67,0] -- invalid utf8
("A",Just 255)
As you can see, the parser function keeps going with PB.drawByte after
PT.utf8 fails. Unless I misunderstand, zoom even undraws the leftovers
returned by PT.utf8, so I don't see a way to detect the error and report it
to the user. Hopefully I'm missing something. :)
kl. 04:48:26 UTC+2 onsdag 21. mai 2014 skrev Gabriel Gonzalez følgende:
>
> Returning the unused input on error is the idiomatic way for a lens to
> handle errors. The parser won't keep going on after the error because the
> `Producer` containing any unused input is stashed inside the return value
> of the outer `Producer`, so the unused input is totally inaccessible to the
> `Parser`. The `Parser` type enforces this behavior:
>
> type Parser a m r = forall x . StateT (Producer a m x) m r
>
> The `forall x` enforces in the types that the `Parser` cannot use whatever
> is stored in the `x` in any meaningful way. Since the unused input is
> stored in that `x`, the `Parser` can't access it.
>
> On 05/16/2014 02:31 AM, Torgeir Strand Henriksen wrote:
>
> I can see that it would be more elegant to zoom rather than use StateT,
> but what options are there for error handling inside an encode/decode lens?
> Wrapping the Text and ByteString chunks in Either sounds like a mess, and
> returning the unused bytes on error like decodeIso8859_1 means the zoom has
> to be runStated in isolation to prevent the parser from keeping on after
> the error. Throwing an exception is possible of course, but would be nice
> to avoid.
>
> kl. 19:18:51 UTC+2 tirsdag 13. mai 2014 skrev Gabriel Gonzalez følgende:
>>
>> It is perfectly acceptable to poke around in the underlying `StateT`.
>> Generally, it is more idiomatic to encode your error-handling logic into
>> the lens itself, but manual state passing is definitely an approved thing
>> to do if you are more comfortable with it. It really comes down to
>> whatever is more readable for you.
>>
>> One of the reasons that I chose `StateT` as the substrate for
>> `pipes-parse` rather than an opaque `Parser` type is that I wanted people
>> to reuse their existing knowledge for how `StateT` works so that they could
>> do things like what you are doing.
>>
>> On 5/13/14, 10:02 AM, Torgeir Strand Henriksen wrote:
>>
>> Great! I'm starting to get a firmer understandig of parsers. I ended up
>> with this:
>>
>> decodeFilename = StateT $ \p -> do
>> (fileName, p') <- runStateT drawAll . view (PB.span (/= 0) . to
>> (PT.decodeAscii . (PB.map (`rotateR` 3) <-<)) . from PT.packChars) $ p
>> Left p'' <- next p'
>> return (fileName, PB.drop 1 <-< join p'')
>>
>> entryParser tableStart = do
>> fileName <- decodeFilename
>> P.decodeGet $ (,,,) fileName <$> fmap (tableStart +) getInt32 <*>
>> getInt32 <*> getInt32
>>
>> Using next instead of drain, decode errors can be handled (pattern match
>> failure for now). Because of drawAll, p'' (result of span) is empty when
>> decode succeeds, so it can simply be joined, and then the terminating 0
>> dropped. Ignoring that the composition chains are a bit on the lengthy
>> side, do you consider it "good style" to poke around in Parser's underlying
>> StateT like that, or is it going against how the libraries are meant to be
>> used?
>>
>> kl. 03:14:37 UTC+2 tirsdag 13. mai 2014 skrev Gabriel Gonzalez følgende:
>>>
>>>
>>> On 5/10/14, 7:59 AM, Torgeir Strand Henriksen wrote:
>>>
>>> Thanks for the reply! The rotated lens is no problem (rotateR is from
>>> Data.Bits), but i'm afraid the data won't decode as UTF-8. Just to make
>>> sure I understand correctly: When you talk about re-encoding unused values,
>>> do you mean the values that would be left if the parser zoomed into was a
>>> different one than drawAll and didn't consume all the data provided by the
>>> span lens?
>>>
>>>
>>> Yes, that's correct. If you write:
>>>
>>> example = do
>>> a <- zoom someLens parser1
>>> parser2
>>>
>>> ... then `someLens` needs to know how to re-encode leftovers from
>>> `parser1` in the format that `parser2` understands.
>>>
>>> I understand why it would be a problem if those leftovers weren't
>>> propagated back, but I'm not sure I understand why that decision can't be
>>> made before the data is rotated and decoded as text. Does it have to do
>>> with the data being bytestrings that get transformed in blocks rather than
>>> per byte?
>>>
>>>
>>> Remember that the parser is totally oblivious about where the `Text`
>>> came from. It doesn't know that the text originated from bytes or rotated
>>> data. All it understands is "I am undrawing some text" and if you want it
>>> to undraw bytes then you need to translate the "undraw text" command to an
>>> "undraw bytes" command. That's what the lens is doing.
>>>
>>> Note that you can still get a lens if you specify a way to handle
>>> errors. Right now the `pipes-text` package provides a one-way decoding
>>> function for latin1 of type:
>>>
>>> decodeIso8859_1 :: Monad m => Producer ByteString m r -> Producer
>>> Text m (Producer ByteString m r)
>>>
>>> If you supplement that with a reverse function of type:
>>>
>>> encoder :: Monad m => Producer Text m (Producer ByteString m r) ->
>>> Producer ByteString m r
>>>
>>> ... then you can create a latin1 lens that you can pass to `zoom`:
>>>
>>> latin1 :: Monad m => Lens' (Producer ByteString m r) (Producer Text
>>> m (Producer ByteString m r))
>>> latin1 = iso decodeIso8859_1 encoder -- I might have these
>>> arguments backwards; I didn't type-check this
>>>
>>> The reason that `pipes-text` doesn't already do this for you is because
>>> Latin1 does not specify how to encode multibyte characters. In other
>>> words, you need to figure out how to convert these exotic characters to
>>> bytes, even if that means just discarding them (i.e. not undrawing the
>>> character at all).
>>>
>>> So if you really want to use latin1 as a lens, you definitely can! It
>>> just requires that you decide you want to encode multibyte characters since
>>> there's no obvious right way to do that. If you don't expect your input to
>>> have multibyte characters then you can just slightly modify
>>> `encodeIso8859_1` to do what you want:
>>>
>>> encoder pText = do
>>> pBytes <- encodeIso8859_1 pText
>>> runEffect (runEffect (pBytes >-> drain) >-> drain)
>>>
>>> That basically keeps decoding until it hits a character that
>>> `encodeIso8859_1` does not know how to encode, then gives up and and drains
>>> the rest of the stream.
>>>
>>>
>>>
>>> Anyway I'll have to go with your second option. Instead of breaking the
>>> parser into multiple code blocks (that have to be runStateTed individually)
>>> in order to get at the bytestring producer, is it reasonable to use get and
>>> put from Control.Monad.State? That way I can keep everything a single
>>> Parser, view the bytestring producer from "get" through the PB.span lens
>>> composed with the transformations, and "put" back the producer returned by
>>> span.
>>>
>>> Bonus question: If the rotated lens was simply Bits a => Int -> Lens' a
>>> a, could it be mapped/zoomed/something over a ByteString producer instead
>>> of including PB.map in the lens? That way rotated would be more reusable.
>>>
>>> On Saturday, May 10, 2014 1:45:32 AM UTC+2, Gabriel Gonzalez wrote:
>>>>
>>>> This works much better if you can make two small changes.
>>>>
>>>> First, I'm guessing that your `rotateR` function has some sort of
>>>> inverse named `rotateL`. If it does, then you can make a rotation lens:
>>>>
>>>> rotated :: Int -> Lens' (Producer ByteString m x) (Producer
>>>> ByteString m x)
>>>> rotated n = iso (PB.map (`rotateR` n)) (PB.map (`rotateL` n))
>>>>
>>>> Second, if you can use utf8 instead of latin1, then you can just write:
>>>>
>>>> decodeFileName :: Parser ByteString String
>>>> decodeFileName = zoom (PB.span (/= 0) . rotated 3 . PT.utf8 . from
>>>> PT.packChars) PP.drawAll
>>>>
>>>> The reason this works is that `rotated` and `utf8` contain extra
>>>> information for how to propagate unused bytes back to the original input
>>>> source. In the case of `rotated` it reverse the original rotation and in
>>>> the case of `utf8` it re-encodes them.
>>>>
>>>> If you don't have information for how to re-encode unused values, then
>>>> you must apply the rotation and encoding to the producer before feeding it
>>>> to the parser:
>>>>
>>>> yourProducer :: Producer ByteString IO ()
>>>>
>>>> runStateT PP.drawAll (yourProducer ^. span (/= 0) ^. to (PB.map
>>>> (`rotateR` n)) ^. PT.utf8 ^. fromPT.packChars)
>>>> :: IO (String, Producer String IO (... {- more nested producers
>>>> -}))
>>>>
>>>> `pipes-parse` doesn't let you merge logic into the parser unless you
>>>> also include logic for how to propagate unused bytes to the input source.
>>>> Without that guarantee you get bugs related to silently dropping input
>>>> values.
>>>>
>>>> On 5/9/14, 11:06 AM, Torgeir Strand Henriksen wrote:
>>>>
>>>> While working with a binary file format, I started out with this naive
>>>> code:
>>>>
>>>> import qualified Pipes.Parse as P
>>>> import qualified Pipes.Binary as P
>>>> import qualified Pipes.ByteString as PB
>>>> import qualified Data.Text as T
>>>> import qualified Data.ByteString as BS
>>>>
>>>> entryParser tableStart = P.decodeGet $ (,,,) <$> decodeFilename <*>
>>>> fmap (tableStart +) getWord32le <*> getWord32le <*> getWord32le
>>>>
>>>> decodeFilename = T.unpack . decodeLatin1 . BS.pack <$> go where
>>>> go = do
>>>> c <- (`rotateR` 3) <$> getWord8
>>>> if c /= 0 then (c :) <$> go else pure [] -- terminate on (and
>>>> consume the) 0
>>>>
>>>> While it does work, I'm unhappy with decodeFilename as it basically
>>>> implements a combination of map and span/fold with explicit recursion. But
>>>> the underlying ByteString isn't available inside the Get monad without
>>>> consuming it, so using e.g. BS.span seems out of the question. Let's see
>>>> if
>>>> lenses can come to the rescue:
>>>>
>>>> entryParser tableStart = do
>>>> nameChunks <- zoom (PB.span (/= 0)) P.drawAll
>>>> PB.drawByte -- draw the terminating 0
>>>> let fileName = T.unpack . decodeLatin1 . BS.map (flip rotateR 3) .
>>>> BS.concat $ nameChunks
>>>> P.decodeGet $ (,,,) fileName <$> fmap (tableStart +) getWord32le
>>>> <*> getWord32le <*> getWord32le
>>>>
>>>> I like this better - map and span aren't implemented manually anymore -
>>>> but at the same time I was hoping for more. It doesn't seem right to work
>>>> directly on ByteStrings (i.e. BS.map instead of PB.map, and text instead
>>>> of
>>>> pipes-text), and the combination of drawAll and concat is a bit awkward,
>>>> especially since drawAll is only for testing (even though all the
>>>> tutorials
>>>> use it :) ). The latter point might be addressed by giving
>>>> pipes-bytestring
>>>> a folding function similar to P.foldAll, but even so I wonder if there's a
>>>> more ideomatic way to do this?
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Haskell Pipes" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Haskell Pipes" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Haskell Pipes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Haskell Pipes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected]<javascript:>
> .
>
>
>
--
You received this message because you are subscribed to the Google Groups
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].