-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

> I think only text types and text-like types (Greg, how does DBD::Pg
> determine this, currently? I'd want CITEXT data to be converted to
> UTF-8, too; is there some way to tell it what types should be utf8?)

As far as stuff coming out of the database, it's only the four text-like
types I mentioned earlier. See line 3329 of dbdimp.c. We might want to
make than an exclusion check, and/or go global as mentioned below.

Now that I've had some time to recall things, I think the primary reason
for not so much automagicness is simply a question of efficiency. Parsing
every string coming out of the database for "utf-8ness" is expensive. Also
expensive is checking client_encoding, although libpq at least tracks
that for us, so it's not as bad as it first looks.

So the next question is, why don't we just flip the utf8 flag on for
all strings coming back from the database? What's the drawbacks?

I need to brush up on my unicode foo, but let's keep the discussion going,
I'd love to see this solved in a way that limits or removes the need
for things like setting specific utf8 flags via the database handle.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
End Point Corporation
PGP Key: 0x14964AC8 200809091043
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkjGi6YACgkQvJuQZxSWSsgrWwCdHt8l1pIyRTEqGv/vkvlKFodV
qC4An0to3nstwKZYAC3aYVr2MdniWHxo
=5AsA
-----END PGP SIGNATURE-----


Reply via email to