-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
> Uh, say what? Just as I need to
>
> binmode STDOUT, ':utf8';
> Before sending stuff to STDOUT (that is, turn off the flag), I would
> expect DBDs to do the same before sending data to the database.
> Unless, of course, it "just works".
I cannot imagine the flag really matters or not. We (Pg) simply dump a
bunch of chars to the database, and build it by slurping in the string
character by character until we hit a null. I suppose other databases
may do things differently, but I can't imagine how/why.
>> Yes, very bad example. Let's call it utf8. Forget 'unicode' entirely.
> Yeah, better, though it' just perpetuates Perl's unfortunate use of
> the term "utf8" for "internal string representation." Though I suppose
> that ship has sunk already.
Yep. To paraphrase horribly, "Perl's unicode support is the worst, except for
all the other languages".
>> Because it may still need to convert things. See the ODBC discussion.
>
> Oh, so you're saying it will decode and encode between Perl's internal
> form and UTF-8, rather than just flip the flag on and off?
Yes, that's a possibility.
> Yes, because you were only talking about utf8 and UTF-8, not any
> other encodings. Unless I missed something. If the data coming back
> from the DB is Big5, I may well want to have some way to decode it
> (and to encode it for write statements).
You mean at the DBD level - such that you can say to the database,
I don't care what encoding you stored it as, I want it encoded
as X when you give it back to me? (update: yes, see below)
>> Well, because utf-8 is pretty much a defacto encoding, or at least
>> way, way more popular than things like ucs2. Also, the Perl utf8
>> flag encourages us to put everything into UTF-8.
>
> Yeah, but again, that might be some reason to call it something else,
> like "perl_native" or something. The fact that it happens to be UTF-8
> should be irrelevant. ER, except, I guess, you still have to know the
> encoding of the database.
Well, I wouldn't call it irrelevant, but at the end of the day, we can
call it perl_native, but that's just going to cause people to look it up
in the docs and then say "aha! that means the utf8 flag is on" and then
they have "perl_native -> utf8" burned into their head. Or worse,
"perl_native -> unicode". :)
>> * 'A': the default, it means the DBD should do the best thing, which in most
>> cases means setting SvUTF8_on if the data coming back is UTF-8.
>> * 'B': (on). The DBD should make every effort to set SvUTF8_on for returned
>> data, even if it thinks it may not be UTF-8.
>> * 'C': (off). The DBD should not call SvUTF8_on, regardless of what it
>> thinks the data is.
> I still prefer an encoding attribute that you can set as follows:
> * undef: Default; same as your A.
> * ':utf8': Same as your B:
> * ':raw': Same as your C
> * $encoding: Encode/decode to/from $encoding
I like that. Although the names are still odd. I guess it does map
though: raw means no utf8 flag. Still not sure about the encode
'to', but I'll start thinking about how we could implement the
'from' in DBD::Pg. How would one map things - just demand that
whatever is given must be a literal encoding the particular database
can understand?
> With an encoding attribute, you don't need the utf8_flag at all.
Right, +1
So the above means these two actually behave very differently:
$dbh->{encoding} = ':utf8';
$dbh->{encoding} = 'utf8';
Could be a little confusing, no? Methinks we some long ugly name, maybe
even worse than "perl_native". Perhaps "perl_internal_utf8_flag"? 1/2 :)
Thanks for plugging away at this. My short term goal is to get this finalized
enough that I can release the next version of DBD::Pg without a 'pg_' prefix
to control the encoding items.
- --
Greg Sabino Mullane [email protected]
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201110061151
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----
iEYEAREDAAYFAk6Nz28ACgkQvJuQZxSWSsiWJQCgt/F0r/sCPDa9GuYrGZpZHlQ2
WfYAn0asIYHmPKz1BDfcBo7wLADHmH7N
=eJmk
-----END PGP SIGNATURE-----