On Tue, Sep 09, 2008 at 02:44:07PM -0000, Greg Sabino Mullane wrote:
> 
> Now that I've had some time to recall things, I think the primary reason
> for not so much automagicness is simply a question of efficiency. Parsing
> every string coming out of the database for "utf-8ness" is expensive. Also
> expensive is checking client_encoding, although libpq at least tracks
> that for us, so it's not as bad as it first looks.
> 
> So the next question is, why don't we just flip the utf8 flag on for
> all strings coming back from the database? What's the drawbacks?

By all strings you mean the current list:

    PG_CHAR
    PG_TEXT
    PG_BPCHAR
    PG_VARCHAR

And above you mean just set the utf8 flag and not check that it's
valid utf8?

Seems reasonable.  If a user sets pg_enable_utf8 then that would mean
that client_encoding is utf8, too.  Therefore, for the types above, we
know it's already encoded as utf8 (well, assuming PG encodes to utf8
that set of columns).

But, yes, blindly setting the utf8 flag can cause problems for all
columns.  Besides Perl likely blowing up on invalid utf8 sequences,
things like length() would be wrong.


Postgresql's client_encoding allows for a different encoding than the
database encoding.  I assume that also means that Postgresql must
decide what columns need to be converted between encodings.

Seems like if DBD::Pg knew that information (what columns were
candidates for re-encoding by Postgresql) then DBD::Pg could then
simply set the utf8 flag on those and not bother with calling
is_utf8_string().  That's assuming that client encoding is utf8, of
course.  That would help if there were other data types that were
indeed character data but not the listed types above.

But, I have no idea how Postgresql actually decides what columns to
re-encode.


I guess the only other client encodings supported would be 8858-1
(and ascii, of course) with pg_enable_utf8 off.






-- 
Bill Moseley
[EMAIL PROTECTED]
Sent from my iMutt

Reply via email to