My comments are inline below:
On 01/22/2014 12:35 PM, Michael Thompson wrote:
I was attempting to figure out how to handle other
encodings besides utf8 in connection with pipes-text,
now that it's clearer what one can hope for from the
newer version of `text`. -- It is easy enough to replicate
the`Codec` types used in enumerator and conduit, and
then define`decode`and`encode` functions. (See e.g.
http://hackage.haskell.org/package/conduit-1.0.8/docs/src/Data-Conduit-Text.html#Codec
http://hackage.haskell.org/package/enumerator-0.4.20/docs/src/Data-Enumerator-Text.html#Codec
)
I tried this, and it doesn't seem to raise any problem
except that it involves a dependency on conduit or
enumerator or else replicating their code (which is
what I did in my experiment).
But, especially given the types we wanted for e.g.
decodeUtf8 :: Producer ByteString m r -> Producer Text m
(Producer ByteString m r)
the type of the different `Codec` s ends up being very
close to a `Lens'` between `Producer ByteString m r`
and `Producer Text m (Producer ByteString m r)` (Or
rather a defective `Iso`) That is, it is similar to
`span isValidUtf8Char` which is a `Lens'` in the new
pipes-parse, and akin to all the lenses exported by
pipes-bytestring .
Yeah. That's the type I had in mind for `decodeUtf8` after the
lens-based parsing changes
decodedUtf8
:: Monad m
=> Lens' (Producer ByteString m r) (Producer Text m (Producer
ByteString m r))
You could make it an `Iso'`, but my experience is that it's not worth it
because if you want to go in the reverse direction you have to add the
`Producer ByteString m r` return value. It's easier to just provide a
function of type:
encodeUtf8 :: Text -> Producer ByteString m ()
... and if they want to encode a stream they can just do:
for stream encodeUtf8
... or use `encodeUtf8` directly to encode a single `Text` value. For an
example of these idioms, see my `lenses` branch for `pipes-binary`:
https://github.com/Gabriel439/pipes-binary/blob/lenses/src/Pipes/Binary.hs
Also, note how the `decoded` lens (analogous to `decodedUtf8`) also
includes more descriptive error information in the return value:
decoded
:: (Monad m, Binary a)
=> Lens' (Producer ByteString m e)
(Producer a m (DecodingError, Producer ByteString m e))
type Codec m = Lens' (Producer ByteString m r) (Producer Text m
(Producer ByteString m r))
utf8 :: Monad m => Codec m
latin1 :: Monad m => Codec m
decodeUtf8 p = p ^. utf8
(There are other possibilities.) One difference from the
conduit/enumerator `Codec` type is that they evisage
possible failure going from `Text` to `ByteString`, which
could certainly happen with `latin1`. But I wonder if
this is necessary given the purposes Gabriel is
thinking of putting this style of `Lens'` to. The
`Producer Text ...` I am dealing with in many contexts
is a `Producer Text m (Producer ByteString m r)` where
the text is validated as utf8.
So let's assume for a second that we had a lens for `latin1` that might
look something like this:
latin1
:: Monad m
=> Lens' (Producer ByteString m ???) (Producer Text m ???)
If the reverse direction can fail (i.e. encoding), that would primarily
impact `zoom`, specifically `zoom latin1 (undraw txt)`. The question we
have to ask is: What should that do if `txt` cannot be encoded? Because
that's essentially what `zoom` has to do: it must translate that `txt`
to a `ByteString` before it can push it back onto the original byte stream.
There are a couple of possible solutions:
1) Don't provide a `latin1` lens at all and instead provide two separate
functions between `Producer`s, one for decoding and one for decoding,
both of which can fail with a leftovers residue. This would then forbid
it from being used with `zoom`.
2) Do something analogous to what `text` does: require the user to
specify a fallback when encoding fails and make this fallback a
mandatory argument to the `latin1` lens. This preserves the ability to
`zoom`.
As far as I can tell, there is no lens-based abstraction for something
that can fail in both directions, even for things that are not
`Producer`s. For example, I've wanted something similar within a
project where I would encode 2 `Char`s representing an element into a
compact `Word8`, and converting in either direction can potentially
fail. Even if I were to define an intermediate `Element` type to bridge
the two imprecise types, I'd be left with two `Prism`s that I wouldn't
be able to compose together (because I can't reverse either `Prism`
using `from`).
Another possiblity is to introduce a special exception
type the way `conduit` and `enumerator` do, and then use
`MonadCatch` or something. So we'd end up with e.g.
utf8 :: MonadCatch m => Lens' (Producer ByteString m r)
(Producer Text m r)
The issue with this approach is that you can't retrieve the remaining
input if you recover from an error. An error in the base monad swallows
all future values.
Really that could be Iso or Kinda_Iso, I guess.
It is a problem about the text library how deeply
entrenched its use of exceptions is. For example on the
exception-avoiding principles we have adopted,
pack :: Producer String m a -> Producer Text m a -- i.e. map T.pack
ought really to be
pack :: Producer String m a -> Producer Text m (Producer String m a)
since `astral plane` Haskell `Chars` cannot be
represented in `Text`, no doubt for reasonable reasons.
So we ought to stop when we hit one and return the rest.
It is really no different from `encodeLatin1`. It makes one
want to use `U.Vector Char` ...
Yeah, that's the right thing to do for `pack`. Just be sure to document
why it can fail so people aren't left confused why it returns a
`Producer String m a` residue.
Sorry, these are somewhat half-baked and perhaps
confused thoughts that came to me after looking at
the swank new `pipes-parse` and `pipes-bytestring`,
having first thought about this `Codec` concept a little.
I was wondering some obvious excellent idea
might occur to someone.
Michael
--
You received this message because you are subscribed to the Google
Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To post to this group, send email to [email protected].
--
You received this message because you are subscribed to the Google Groups "Haskell
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].