My comments are inline below:

On 01/22/2014 12:35 PM, Michael Thompson wrote:
I was attempting to figure out how to handle other
encodings besides utf8 in connection with pipes-text,
now that it's clearer what one can hope for from the
newer version of `text`. -- It is easy enough to replicate
the`Codec` types used in enumerator and conduit, and
then define`decode`and`encode` functions. (See e.g.

http://hackage.haskell.org/package/conduit-1.0.8/docs/src/Data-Conduit-Text.html#Codec
http://hackage.haskell.org/package/enumerator-0.4.20/docs/src/Data-Enumerator-Text.html#Codec )

I tried this, and it doesn't seem to raise any problem
except that it involves a dependency on conduit or
enumerator or else replicating their code (which is
what I did in my experiment).

But, especially given the types we wanted for e.g.

decodeUtf8 :: Producer ByteString m r -> Producer Text m (Producer ByteString m r)

the type of the different `Codec` s ends up being very
close to a `Lens'` between `Producer ByteString m r`
and `Producer Text m (Producer ByteString m r)` (Or
rather a defective `Iso`) That is, it is similar to
`span isValidUtf8Char` which is a `Lens'` in the new
pipes-parse, and akin to all the lenses exported by
pipes-bytestring .

Yeah. That's the type I had in mind for `decodeUtf8` after the lens-based parsing changes

    decodedUtf8
        :: Monad m
=> Lens' (Producer ByteString m r) (Producer Text m (Producer ByteString m r))

You could make it an `Iso'`, but my experience is that it's not worth it because if you want to go in the reverse direction you have to add the `Producer ByteString m r` return value. It's easier to just provide a function of type:

    encodeUtf8 :: Text -> Producer ByteString m ()

... and if they want to encode a stream they can just do:

    for stream encodeUtf8

... or use `encodeUtf8` directly to encode a single `Text` value. For an example of these idioms, see my `lenses` branch for `pipes-binary`:

https://github.com/Gabriel439/pipes-binary/blob/lenses/src/Pipes/Binary.hs

Also, note how the `decoded` lens (analogous to `decodedUtf8`) also includes more descriptive error information in the return value:

    decoded
        :: (Monad m, Binary a)
        => Lens' (Producer ByteString m e)
                 (Producer a m (DecodingError, Producer ByteString m e))


type Codec m = Lens' (Producer ByteString m r) (Producer Text m (Producer ByteString m r))
     utf8 :: Monad m => Codec m
     latin1 :: Monad m => Codec m
     decodeUtf8 p = p ^. utf8

(There are other possibilities.) One difference from the
conduit/enumerator `Codec` type is that they evisage
possible failure going from `Text` to `ByteString`, which
could certainly happen with `latin1`. But I wonder if
this is necessary given the purposes Gabriel is
thinking of putting this style of `Lens'` to. The
`Producer Text ...` I am dealing with in many contexts
is a `Producer Text m (Producer ByteString m r)` where
the text is validated as utf8.

So let's assume for a second that we had a lens for `latin1` that might look something like this:

    latin1
        :: Monad m
        => Lens' (Producer ByteString m ???) (Producer Text m ???)

If the reverse direction can fail (i.e. encoding), that would primarily impact `zoom`, specifically `zoom latin1 (undraw txt)`. The question we have to ask is: What should that do if `txt` cannot be encoded? Because that's essentially what `zoom` has to do: it must translate that `txt` to a `ByteString` before it can push it back onto the original byte stream.

There are a couple of possible solutions:

1) Don't provide a `latin1` lens at all and instead provide two separate functions between `Producer`s, one for decoding and one for decoding, both of which can fail with a leftovers residue. This would then forbid it from being used with `zoom`.

2) Do something analogous to what `text` does: require the user to specify a fallback when encoding fails and make this fallback a mandatory argument to the `latin1` lens. This preserves the ability to `zoom`.

As far as I can tell, there is no lens-based abstraction for something that can fail in both directions, even for things that are not `Producer`s. For example, I've wanted something similar within a project where I would encode 2 `Char`s representing an element into a compact `Word8`, and converting in either direction can potentially fail. Even if I were to define an intermediate `Element` type to bridge the two imprecise types, I'd be left with two `Prism`s that I wouldn't be able to compose together (because I can't reverse either `Prism` using `from`).


Another possiblity is to introduce a special exception
type the way `conduit` and `enumerator` do, and then use
`MonadCatch` or something. So we'd end up with e.g.

utf8 :: MonadCatch m => Lens' (Producer ByteString m r) (Producer Text m r)


The issue with this approach is that you can't retrieve the remaining input if you recover from an error. An error in the base monad swallows all future values.

Really that could be Iso or Kinda_Iso, I guess.

It is a problem about the text library how deeply
entrenched its use of exceptions is. For example on the
exception-avoiding principles we have adopted,

     pack :: Producer String m a -> Producer Text m a  -- i.e. map T.pack

ought really to be

     pack :: Producer String m a -> Producer Text m (Producer String m a)

since `astral plane` Haskell `Chars` cannot be
represented in `Text`, no doubt for reasonable reasons.
So we ought to stop when we hit one and return the rest.
It is really no different from `encodeLatin1`. It makes one
want to use `U.Vector Char` ...

Yeah, that's the right thing to do for `pack`. Just be sure to document why it can fail so people aren't left confused why it returns a `Producer String m a` residue.


Sorry, these are somewhat half-baked and perhaps
confused  thoughts that came to me after looking at
the swank new `pipes-parse` and `pipes-bytestring`,
having first thought about this `Codec` concept a little.
I was wondering some obvious excellent idea
might occur to someone.

Michael
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].

--
You received this message because you are subscribed to the Google Groups "Haskell 
Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Reply via email to