Re: [haskell-pipes] Re: Lens-based parsing

Gabriel Gonzalez Tue, 21 Jan 2014 22:22:54 -0800

My comments are inline below:

On 01/22/2014 12:35 PM, Michael Thompson wrote:

I was attempting to figure out how to handle other
encodings besides utf8 in connection with pipes-text,
now that it's clearer what one can hope for from the
newer version of `text`. -- It is easy enough to replicate
the`Codec` types used in enumerator and conduit, and
then define`decode`and`encode` functions. (See e.g.


http://hackage.haskell.org/package/conduit-1.0.8/docs/src/Data-Conduit-Text.html#Codec

http://hackage.haskell.org/package/enumerator-0.4.20/docs/src/Data-Enumerator-Text.html#Codec)


I tried this, and it doesn't seem to raise any problem
except that it involves a dependency on conduit or
enumerator or else replicating their code (which is
what I did in my experiment).

But, especially given the types we wanted for e.g.

decodeUtf8 :: Producer ByteString m r -> Producer Text m(Producer ByteString m r)


the type of the different `Codec` s ends up being very
close to a `Lens'` between `Producer ByteString m r`
and `Producer Text m (Producer ByteString m r)` (Or
rather a defective `Iso`) That is, it is similar to
`span isValidUtf8Char` which is a `Lens'` in the new
pipes-parse, and akin to all the lenses exported by
pipes-bytestring .

Yeah. That's the type I had in mind for `decodeUtf8` after thelens-based parsing changes


    decodedUtf8
        :: Monad m

=> Lens' (Producer ByteString m r) (Producer Text m (ProducerByteString m r))

You could make it an `Iso'`, but my experience is that it's not worth itbecause if you want to go in the reverse direction you have to add the`Producer ByteString m r` return value. It's easier to just provide afunction of type:


    encodeUtf8 :: Text -> Producer ByteString m ()

... and if they want to encode a stream they can just do:

    for stream encodeUtf8

... or use `encodeUtf8` directly to encode a single `Text` value. For anexample of these idioms, see my `lenses` branch for `pipes-binary`:


https://github.com/Gabriel439/pipes-binary/blob/lenses/src/Pipes/Binary.hs

Also, note how the `decoded` lens (analogous to `decodedUtf8`) alsoincludes more descriptive error information in the return value:


    decoded
        :: (Monad m, Binary a)
        => Lens' (Producer ByteString m e)
                 (Producer a m (DecodingError, Producer ByteString m e))

type Codec m = Lens' (Producer ByteString m r) (Producer Text m(Producer ByteString m r))

     utf8 :: Monad m => Codec m
     latin1 :: Monad m => Codec m
     decodeUtf8 p = p ^. utf8

(There are other possibilities.) One difference from the
conduit/enumerator `Codec` type is that they evisage
possible failure going from `Text` to `ByteString`, which
could certainly happen with `latin1`. But I wonder if
this is necessary given the purposes Gabriel is
thinking of putting this style of `Lens'` to. The
`Producer Text ...` I am dealing with in many contexts
is a `Producer Text m (Producer ByteString m r)` where
the text is validated as utf8.

So let's assume for a second that we had a lens for `latin1` that mightlook something like this:


    latin1
        :: Monad m
        => Lens' (Producer ByteString m ???) (Producer Text m ???)

If the reverse direction can fail (i.e. encoding), that would primarilyimpact `zoom`, specifically `zoom latin1 (undraw txt)`. The question wehave to ask is: What should that do if `txt` cannot be encoded? Becausethat's essentially what `zoom` has to do: it must translate that `txt`to a `ByteString` before it can push it back onto the original byte stream.


There are a couple of possible solutions:

1) Don't provide a `latin1` lens at all and instead provide two separatefunctions between `Producer`s, one for decoding and one for decoding,both of which can fail with a leftovers residue. This would then forbidit from being used with `zoom`.

2) Do something analogous to what `text` does: require the user tospecify a fallback when encoding fails and make this fallback amandatory argument to the `latin1` lens. This preserves the ability to`zoom`.

As far as I can tell, there is no lens-based abstraction for somethingthat can fail in both directions, even for things that are not`Producer`s. For example, I've wanted something similar within aproject where I would encode 2 `Char`s representing an element into acompact `Word8`, and converting in either direction can potentiallyfail. Even if I were to define an intermediate `Element` type to bridgethe two imprecise types, I'd be left with two `Prism`s that I wouldn'tbe able to compose together (because I can't reverse either `Prism`using `from`).


Another possiblity is to introduce a special exception
type the way `conduit` and `enumerator` do, and then use
`MonadCatch` or something. So we'd end up with e.g.

utf8 :: MonadCatch m => Lens' (Producer ByteString m r)(Producer Text m r)

The issue with this approach is that you can't retrieve the remaininginput if you recover from an error. An error in the base monad swallowsall future values.

Really that could be Iso or Kinda_Iso, I guess.

It is a problem about the text library how deeply
entrenched its use of exceptions is. For example on the
exception-avoiding principles we have adopted,

     pack :: Producer String m a -> Producer Text m a  -- i.e. map T.pack

ought really to be

     pack :: Producer String m a -> Producer Text m (Producer String m a)

since `astral plane` Haskell `Chars` cannot be
represented in `Text`, no doubt for reasonable reasons.
So we ought to stop when we hit one and return the rest.
It is really no different from `encodeLatin1`. It makes one
want to use `U.Vector Char` ...

Yeah, that's the right thing to do for `pack`. Just be sure to documentwhy it can fail so people aren't left confused why it returns a`Producer String m a` residue.

Sorry, these are somewhat half-baked and perhaps
confused  thoughts that came to me after looking at
the swank new `pipes-parse` and `pipes-bytestring`,
having first thought about this `Codec` concept a little.
I was wondering some obvious excellent idea
might occur to someone.

Michael
--
You received this message because you are subscribed to the GoogleGroups "Haskell Pipes" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected].
To post to this group, send email to [email protected].


--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] Re: Lens-based parsing

Reply via email to