-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

...
> And maybe that's the default. But I should be able to tell it to be pedantic 
> when the 
> data is known to be bad (see, for example data from an SQL_ASCII-encoded 
> PostgreSQL database).
...
> DBD::Pg's approach is currently broken. Greg is working on fixing it, but for 
> compatibility 
> reasons the fix is non-trivial (an the API might be, too). In a perfect world 
> DBD::Pg would 
> just always do the right thing, as the database tells it what encodings to 
> use when you 
> connect (and *all* data is encoded as such, not just certain data types). But 
> the world is 
> not perfect, there's a lot of legacy stuff.
>
> Greg, care to add any other details?

My thinking on this has changed a bit. See the DBD::Pg in git head for a 
sample, but basically, 
DBD::Pg is going to:

* Flip the flag on if the client_encoding is UTF-8 (and server_encoding is not 
SQL_ASCII)
* Flip if off if not

The single switch will be pg_unicode_flag, which will basiccaly override the 
automatic 
choice above, just in case you really want your SQL_ASCII byte soup marked as 
utf8 for 
some reason, or (more likely), you want your data unmarked as utf8 despite 
being so.

This does rely on PostgreSQL doing the right thing when it comes to 
encoding/decoding/storing 
all the encodings, but I'm pretty sure it's doing well in that regard.

...

Since nobody has actally defined a specific interface yet, let me throw out a 
straw man. It may look familiar :)

===
* $h->{unicode_flag}

If this is set on, data returned from the database is assumed to be UTF-8, and 
the utf8 flag will be set. DBDs will decode the data as needed.

If this is set off, the utf8 flag will never be set, and no decoding will be 
done 
on data coming back from the database.

If this is not set (undefined), the underlying DBD is responsible for doing the 
correct thing. In other words, the behaviour is undefined.
===

I don't think this will fit into DBD::Pgs current implementation perfectly, as 
we wouldn't want people to simply leave $h->{unicode_flag} on, as that would 
force SQL_ASCII text to have utf8 flipped on. Perhaps we simply never, ever 
allow that.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201109211651
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAk56TngACgkQvJuQZxSWSsiIfwCeKMfsg2RYsCzDuwb8FnmZhhbu
8LgAn2TNLuKirq5IDAhlCNmQ3gxbnuq7
=k+Fi
-----END PGP SIGNATURE-----


Reply via email to