It is perfectly acceptable to poke around in the underlying `StateT`.
Generally, it is more idiomatic to encode your error-handling logic into
the lens itself, but manual state passing is definitely an approved
thing to do if you are more comfortable with it. It really comes down
to whatever is more readable for you.
One of the reasons that I chose `StateT` as the substrate for
`pipes-parse` rather than an opaque `Parser` type is that I wanted
people to reuse their existing knowledge for how `StateT` works so that
they could do things like what you are doing.
On 5/13/14, 10:02 AM, Torgeir Strand Henriksen wrote:
Great! I'm starting to get a firmer understandig of parsers. I ended
up with this:
decodeFilename = StateT $ \p -> do
(fileName, p') <- runStateT drawAll . view (PB.span (/= 0) . to
(PT.decodeAscii . (PB.map (`rotateR` 3) <-<)) . from PT.packChars) $ p
Left p'' <- next p'
return (fileName, PB.drop 1 <-< join p'')
entryParser tableStart = do
fileName <- decodeFilename
P.decodeGet $ (,,,) fileName <$> fmap (tableStart +) getInt32 <*>
getInt32 <*> getInt32
Using next instead of drain, decode errors can be handled (pattern
match failure for now). Because of drawAll, p'' (result of span) is
empty when decode succeeds, so it can simply be joined, and then the
terminating 0 dropped. Ignoring that the composition chains are a bit
on the lengthy side, do you consider it "good style" to poke around in
Parser's underlying StateT like that, or is it going against how the
libraries are meant to be used?
kl. 03:14:37 UTC+2 tirsdag 13. mai 2014 skrev Gabriel Gonzalez følgende:
On 5/10/14, 7:59 AM, Torgeir Strand Henriksen wrote:
Thanks for the reply! The rotated lens is no problem (rotateR is
from Data.Bits), but i'm afraid the data won't decode as UTF-8.
Just to make sure I understand correctly: When you talk about
re-encoding unused values, do you mean the values that would be
left if the parser zoomed into was a different one than drawAll
and didn't consume all the data provided by the span lens?
Yes, that's correct. If you write:
example = do
a <- zoom someLens parser1
parser2
... then `someLens` needs to know how to re-encode leftovers from
`parser1` in the format that `parser2` understands.
I understand why it would be a problem if those leftovers weren't
propagated back, but I'm not sure I understand why that decision
can't be made before the data is rotated and decoded as text.
Does it have to do with the data being bytestrings that get
transformed in blocks rather than per byte?
Remember that the parser is totally oblivious about where the
`Text` came from. It doesn't know that the text originated from
bytes or rotated data. All it understands is "I am undrawing some
text" and if you want it to undraw bytes then you need to
translate the "undraw text" command to an "undraw bytes" command.
That's what the lens is doing.
Note that you can still get a lens if you specify a way to handle
errors. Right now the `pipes-text` package provides a one-way
decoding function for latin1 of type:
decodeIso8859_1 :: Monad m => Producer ByteString m r ->
Producer Text m (Producer ByteString m r)
If you supplement that with a reverse function of type:
encoder :: Monad m => Producer Text m (Producer ByteString m
r) -> Producer ByteString m r
... then you can create a latin1 lens that you can pass to `zoom`:
latin1 :: Monad m => Lens' (Producer ByteString m r) (Producer
Text m (Producer ByteString m r))
latin1 = iso decodeIso8859_1 encoder -- I might have these
arguments backwards; I didn't type-check this
The reason that `pipes-text` doesn't already do this for you is
because Latin1 does not specify how to encode multibyte
characters. In other words, you need to figure out how to convert
these exotic characters to bytes, even if that means just
discarding them (i.e. not undrawing the character at all).
So if you really want to use latin1 as a lens, you definitely
can! It just requires that you decide you want to encode
multibyte characters since there's no obvious right way to do
that. If you don't expect your input to have multibyte characters
then you can just slightly modify `encodeIso8859_1` to do what you
want:
encoder pText = do
pBytes <- encodeIso8859_1 pText
runEffect (runEffect (pBytes >-> drain) >-> drain)
That basically keeps decoding until it hits a character that
`encodeIso8859_1` does not know how to encode, then gives up and
and drains the rest of the stream.
Anyway I'll have to go with your second option. Instead of
breaking the parser into multiple code blocks (that have to be
runStateTed individually) in order to get at the bytestring
producer, is it reasonable to use get and put from
Control.Monad.State? That way I can keep everything a single
Parser, view the bytestring producer from "get" through the
PB.span lens composed with the transformations, and "put" back
the producer returned by span.
Bonus question: If the rotated lens was simply Bits a => Int ->
Lens' a a, could it be mapped/zoomed/something over a ByteString
producer instead of including PB.map in the lens? That way
rotated would be more reusable.
On Saturday, May 10, 2014 1:45:32 AM UTC+2, Gabriel Gonzalez wrote:
This works much better if you can make two small changes.
First, I'm guessing that your `rotateR` function has some
sort of inverse named `rotateL`. If it does, then you can
make a rotation lens:
rotated :: Int -> Lens' (Producer ByteString m x)
(Producer ByteString m x)
rotated n = iso (PB.map (`rotateR` n)) (PB.map (`rotateL` n))
Second, if you can use utf8 instead of latin1, then you can
just write:
decodeFileName :: Parser ByteString String
decodeFileName = zoom (PB.span (/= 0) . rotated 3 .
PT.utf8 . from PT.packChars) PP.drawAll
The reason this works is that `rotated` and `utf8` contain
extra information for how to propagate unused bytes back to
the original input source. In the case of `rotated` it
reverse the original rotation and in the case of `utf8` it
re-encodes them.
If you don't have information for how to re-encode unused
values, then you must apply the rotation and encoding to the
producer before feeding it to the parser:
yourProducer :: Producer ByteString IO ()
runStateT PP.drawAll (yourProducer ^. span (/= 0) ^. to
(PB.map (`rotateR` n)) ^. PT.utf8 ^. fromPT.packChars)
:: IO (String, Producer String IO (... {- more nested
producers -}))
`pipes-parse` doesn't let you merge logic into the parser
unless you also include logic for how to propagate unused
bytes to the input source. Without that guarantee you get
bugs related to silently dropping input values.
On 5/9/14, 11:06 AM, Torgeir Strand Henriksen wrote:
While working with a binary file format, I started out with
this naive code:
import qualified Pipes.Parse as P
import qualified Pipes.Binary as P
import qualified Pipes.ByteString as PB
import qualified Data.Text as T
import qualified Data.ByteString as BS
entryParser tableStart = P.decodeGet $ (,,,) <$>
decodeFilename <*> fmap (tableStart +) getWord32le <*>
getWord32le <*> getWord32le
decodeFilename = T.unpack . decodeLatin1 . BS.pack <$> go where
go = do
c <- (`rotateR` 3) <$> getWord8
if c /= 0 then (c :) <$> go else pure [] --
terminate on (and consume the) 0
While it does work, I'm unhappy with decodeFilename as it
basically implements a combination of map and span/fold with
explicit recursion. But the underlying ByteString isn't
available inside the Get monad without consuming it, so
using e.g. BS.span seems out of the question. Let's see if
lenses can come to the rescue:
entryParser tableStart = do
nameChunks <- zoom (PB.span (/= 0)) P.drawAll
PB.drawByte -- draw the terminating 0
let fileName = T.unpack . decodeLatin1 . BS.map (flip
rotateR 3) . BS.concat $ nameChunks
P.decodeGet $ (,,,) fileName <$> fmap (tableStart +)
getWord32le <*> getWord32le <*> getWord32le
I like this better - map and span aren't implemented
manually anymore - but at the same time I was hoping for
more. It doesn't seem right to work directly on ByteStrings
(i.e. BS.map instead of PB.map, and text instead of
pipes-text), and the combination of drawAll and concat is a
bit awkward, especially since drawAll is only for testing
(even though all the tutorials use it :) ). The latter point
might be addressed by giving pipes-bytestring a folding
function similar to P.foldAll, but even so I wonder if
there's a more ideomatic way to do this?
--
You received this message because you are subscribed to the
Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to [email protected].
To post to this group, send email to [email protected].
--
You received this message because you are subscribed to the
Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected] <javascript:>.
To post to this group, send email to [email protected]
<javascript:>.
--
You received this message because you are subscribed to the Google
Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
--
You received this message because you are subscribed to the Google Groups "Haskell
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].