Re: [haskell-pipes] Re: Lens-based parsing

Michael Thompson Tue, 21 Jan 2014 21:36:54 -0800

I was attempting to figure out how to handle other
encodings besides utf8 in connection with pipes-text,
now that it's clearer what one can hope for from the 
newer version of `text`. -- It is easy enough to replicate 
the`Codec` types used in enumerator and conduit, and 
then define`decode`and`encode` functions. (See e.g.

http://hackage.haskell.org/package/conduit-1.0.8/docs/src/Data-Conduit-Text.html#Codec

http://hackage.haskell.org/package/enumerator-0.4.20/docs/src/Data-Enumerator-Text.html#Codec

)

I tried this, and it doesn't seem to raise any problem
except that it involves a dependency on conduit or
enumerator or else replicating their code (which is
what I did in my experiment).

But, especially given the types we wanted for e.g.

decodeUtf8 :: Producer ByteString m r -> Producer Text m (Producer
ByteString m r)

the type of the different `Codec` s ends up being very
close to a `Lens'` between `Producer ByteString m r`
and `Producer Text m (Producer ByteString m r)` (Or
rather a defective `Iso`) That is, it is similar to
`span isValidUtf8Char` which is a `Lens'` in the new
pipes-parse, and akin to all the lenses exported by
pipes-bytestring .

type Codec m = Lens' (Producer ByteString m r) (Producer Text m
(Producer ByteString m r))
utf8 :: Monad m => Codec m
latin1 :: Monad m => Codec m
decodeUtf8 p = p ^. utf8

(There are other possibilities.) One difference from the
conduit/enumerator `Codec` type is that they evisage
possible failure going from `Text` to `ByteString`, which
could certainly happen with `latin1`. But I wonder if
this is necessary given the purposes Gabriel is
thinking of putting this style of `Lens'` to. The
`Producer Text ...` I am dealing with in many contexts
is a `Producer Text m (Producer ByteString m r)` where
the text is validated as utf8.

Another possiblity is to introduce a special exception
type the way `conduit` and `enumerator` do, and then use
`MonadCatch` or something. So we'd end up with e.g.

utf8 :: MonadCatch m => Lens' (Producer ByteString m r) (Producer
Text m r)

Really that could be Iso or Kinda_Iso, I guess.

It is a problem about the text library how deeply
entrenched its use of exceptions is. For example on the
exception-avoiding principles we have adopted,

pack :: Producer String m a -> Producer Text m a -- i.e. map T.pack

ought really to be

pack :: Producer String m a -> Producer Text m (Producer String m a)

since `astral plane` Haskell `Chars` cannot be
represented in `Text`, no doubt for reasonable reasons.
So we ought to stop when we hit one and return the rest.
It is really no different from `encodeLatin1`. It makes one
want to use `U.Vector Char` ...

Sorry, these are somewhat half-baked and perhaps
confused thoughts that came to me after looking at
the swank new `pipes-parse` and `pipes-bytestring`,
having first thought about this `Codec` concept a little.
I was wondering some obvious excellent idea
might occur to someone.

Michael

--
You received this message because you are subscribed to the Google Groups
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].

Re: [haskell-pipes] Re: Lens-based parsing

Reply via email to