Great! I'm starting to get a firmer understandig of parsers.
I ended up with this:
decodeFilename = StateT $ \p -> do
(fileName, p') <- runStateT drawAll . view (PB.span (/=
0) . to (PT.decodeAscii . (PB.map (`rotateR` 3) <-<)) . from
PT.packChars) $ p
Left p'' <- next p'
return (fileName, PB.drop 1 <-< join p'')
entryParser tableStart = do
fileName <- decodeFilename
P.decodeGet $ (,,,) fileName <$> fmap (tableStart +)
getInt32 <*> getInt32 <*> getInt32
Using next instead of drain, decode errors can be handled
(pattern match failure for now). Because of drawAll, p''
(result of span) is empty when decode succeeds, so it can
simply be joined, and then the terminating 0 dropped.
Ignoring that the composition chains are a bit on the
lengthy side, do you consider it "good style" to poke around
in Parser's underlying StateT like that, or is it going
against how the libraries are meant to be used?
kl. 03:14:37 UTC+2 tirsdag 13. mai 2014 skrev Gabriel
Gonzalez følgende:
On 5/10/14, 7:59 AM, Torgeir Strand Henriksen wrote:
Thanks for the reply! The rotated lens is no problem
(rotateR is from Data.Bits), but i'm afraid the data
won't decode as UTF-8. Just to make sure I understand
correctly: When you talk about re-encoding unused
values, do you mean the values that would be left if
the parser zoomed into was a different one than drawAll
and didn't consume all the data provided by the span lens?
Yes, that's correct. If you write:
example = do
a <- zoom someLens parser1
parser2
... then `someLens` needs to know how to re-encode
leftovers from `parser1` in the format that `parser2`
understands.
I understand why it would be a problem if those
leftovers weren't propagated back, but I'm not sure I
understand why that decision can't be made before the
data is rotated and decoded as text. Does it have to do
with the data being bytestrings that get transformed in
blocks rather than per byte?
Remember that the parser is totally oblivious about
where the `Text` came from. It doesn't know that the
text originated from bytes or rotated data. All it
understands is "I am undrawing some text" and if you
want it to undraw bytes then you need to translate the
"undraw text" command to an "undraw bytes" command.
That's what the lens is doing.
Note that you can still get a lens if you specify a way
to handle errors. Right now the `pipes-text` package
provides a one-way decoding function for latin1 of type:
decodeIso8859_1 :: Monad m => Producer ByteString m
r -> Producer Text m (Producer ByteString m r)
If you supplement that with a reverse function of type:
encoder :: Monad m => Producer Text m (Producer
ByteString m r) -> Producer ByteString m r
... then you can create a latin1 lens that you can pass
to `zoom`:
latin1 :: Monad m => Lens' (Producer ByteString m r)
(Producer Text m (Producer ByteString m r))
latin1 = iso decodeIso8859_1 encoder -- I might
have these arguments backwards; I didn't type-check this
The reason that `pipes-text` doesn't already do this for
you is because Latin1 does not specify how to encode
multibyte characters. In other words, you need to figure
out how to convert these exotic characters to bytes,
even if that means just discarding them (i.e. not
undrawing the character at all).
So if you really want to use latin1 as a lens, you
definitely can! It just requires that you decide you
want to encode multibyte characters since there's no
obvious right way to do that. If you don't expect your
input to have multibyte characters then you can just
slightly modify `encodeIso8859_1` to do what you want:
encoder pText = do
pBytes <- encodeIso8859_1 pText
runEffect (runEffect (pBytes >-> drain) >-> drain)
That basically keeps decoding until it hits a character
that `encodeIso8859_1` does not know how to encode, then
gives up and and drains the rest of the stream.
Anyway I'll have to go with your second option. Instead
of breaking the parser into multiple code blocks (that
have to be runStateTed individually) in order to get at
the bytestring producer, is it reasonable to use get
and put from Control.Monad.State? That way I can keep
everything a single Parser, view the bytestring
producer from "get" through the PB.span lens composed
with the transformations, and "put" back the producer
returned by span.
Bonus question: If the rotated lens was simply Bits a
=> Int -> Lens' a a, could it be
mapped/zoomed/something over a ByteString producer
instead of including PB.map in the lens? That way
rotated would be more reusable.
On Saturday, May 10, 2014 1:45:32 AM UTC+2, Gabriel
Gonzalez wrote:
This works much better if you can make two small
changes.
First, I'm guessing that your `rotateR` function
has some sort of inverse named `rotateL`. If it
does, then you can make a rotation lens:
rotated :: Int -> Lens' (Producer ByteString m
x) (Producer ByteString m x)
rotated n = iso (PB.map (`rotateR` n)) (PB.map
(`rotateL` n))
Second, if you can use utf8 instead of latin1, then
you can just write:
decodeFileName :: Parser ByteString String
decodeFileName = zoom (PB.span (/= 0) . rotated
3 . PT.utf8 . from PT.packChars) PP.drawAll
The reason this works is that `rotated` and `utf8`
contain extra information for how to propagate
unused bytes back to the original input source. In
the case of `rotated` it reverse the original
rotation and in the case of `utf8` it re-encodes them.
If you don't have information for how to re-encode
unused values, then you must apply the rotation and
encoding to the producer before feeding it to the
parser:
yourProducer :: Producer ByteString IO ()
runStateT PP.drawAll (yourProducer ^. span (/=
0) ^. to (PB.map (`rotateR` n)) ^. PT.utf8 ^.
fromPT.packChars)
:: IO (String, Producer String IO (... {-
more nested producers -}))
`pipes-parse` doesn't let you merge logic into the
parser unless you also include logic for how to
propagate unused bytes to the input source.
Without that guarantee you get bugs related to
silently dropping input values.
On 5/9/14, 11:06 AM, Torgeir Strand Henriksen wrote:
While working with a binary file format, I started
out with this naive code:
import qualified Pipes.Parse as P
import qualified Pipes.Binary as P
import qualified Pipes.ByteString as PB
import qualified Data.Text as T
import qualified Data.ByteString as BS
entryParser tableStart = P.decodeGet $ (,,,) <$>
decodeFilename <*> fmap (tableStart +) getWord32le
<*> getWord32le <*> getWord32le
decodeFilename = T.unpack . decodeLatin1 . BS.pack
<$> go where
go = do
c <- (`rotateR` 3) <$> getWord8
if c /= 0 then (c :) <$> go else pure []
-- terminate on (and consume the) 0
While it does work, I'm unhappy with
decodeFilename as it basically implements a
combination of map and span/fold with explicit
recursion. But the underlying ByteString isn't
available inside the Get monad without consuming
it, so using e.g. BS.span seems out of the
question. Let's see if lenses can come to the rescue:
entryParser tableStart = do
nameChunks <- zoom (PB.span (/= 0)) P.drawAll
PB.drawByte -- draw the terminating 0
let fileName = T.unpack . decodeLatin1 .
BS.map (flip rotateR 3) . BS.concat $ nameChunks
P.decodeGet $ (,,,) fileName <$> fmap
(tableStart +) getWord32le <*> getWord32le <*>
getWord32le
I like this better - map and span aren't
implemented manually anymore - but at the same
time I was hoping for more. It doesn't seem right
to work directly on ByteStrings (i.e. BS.map
instead of PB.map, and text instead of
pipes-text), and the combination of drawAll and
concat is a bit awkward, especially since drawAll
is only for testing (even though all the tutorials
use it :) ). The latter point might be addressed
by giving pipes-bytestring a folding function
similar to P.foldAll, but even so I wonder if
there's a more ideomatic way to do this?
--
You received this message because you are
subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving
emails from it, send an email to
[email protected].
To post to this group, send email to
[email protected].
--
You received this message because you are subscribed to
the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving
emails from it, send an email to
[email protected].
To post to this group, send email to
[email protected].
--
You received this message because you are subscribed to the
Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to [email protected].
To post to this group, send email to [email protected].