I was attempting to figure out how to handle other
encodings besides utf8 in connection with pipes-text,
now that it's clearer what one can hope for from the
newer version of `text`. -- It is easy enough to replicate
the`Codec` types used in enumerator and conduit, and
then define`decode`and`encode` functions. (See e.g.
http://hackage.haskell.org/package/conduit-1.0.8/docs/src/Data-Conduit-Text.html#Codec
http://hackage.haskell.org/package/enumerator-0.4.20/docs/src/Data-Enumerator-Text.html#Codec
)
I tried this, and it doesn't seem to raise any problem
except that it involves a dependency on conduit or
enumerator or else replicating their code (which is
what I did in my experiment).
But, especially given the types we wanted for e.g.
decodeUtf8 :: Producer ByteString m r -> Producer Text m (Producer
ByteString m r)
the type of the different `Codec` s ends up being very
close to a `Lens'` between `Producer ByteString m r`
and `Producer Text m (Producer ByteString m r)` (Or
rather a defective `Iso`) That is, it is similar to
`span isValidUtf8Char` which is a `Lens'` in the new
pipes-parse, and akin to all the lenses exported by
pipes-bytestring .
type Codec m = Lens' (Producer ByteString m r) (Producer Text m
(Producer ByteString m r))
utf8 :: Monad m => Codec m
latin1 :: Monad m => Codec m
decodeUtf8 p = p ^. utf8
(There are other possibilities.) One difference from the
conduit/enumerator `Codec` type is that they evisage
possible failure going from `Text` to `ByteString`, which
could certainly happen with `latin1`. But I wonder if
this is necessary given the purposes Gabriel is
thinking of putting this style of `Lens'` to. The
`Producer Text ...` I am dealing with in many contexts
is a `Producer Text m (Producer ByteString m r)` where
the text is validated as utf8.
Another possiblity is to introduce a special exception
type the way `conduit` and `enumerator` do, and then use
`MonadCatch` or something. So we'd end up with e.g.
utf8 :: MonadCatch m => Lens' (Producer ByteString m r) (Producer
Text m r)
Really that could be Iso or Kinda_Iso, I guess.
It is a problem about the text library how deeply
entrenched its use of exceptions is. For example on the
exception-avoiding principles we have adopted,
pack :: Producer String m a -> Producer Text m a -- i.e. map T.pack
ought really to be
pack :: Producer String m a -> Producer Text m (Producer String m a)
since `astral plane` Haskell `Chars` cannot be
represented in `Text`, no doubt for reasonable reasons.
So we ought to stop when we hit one and return the rest.
It is really no different from `encodeLatin1`. It makes one
want to use `U.Vector Char` ...
Sorry, these are somewhat half-baked and perhaps
confused thoughts that came to me after looking at
the swank new `pipes-parse` and `pipes-bytestring`,
having first thought about this `Codec` concept a little.
I was wondering some obvious excellent idea
might occur to someone.
Michael
--
You received this message because you are subscribed to the Google Groups
"Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].