On Wed, Jul 03, 2013 at 01:26:19AM -0000, Greg Sabino Mullane wrote: > > David E. Wheeler wrote: > > > What happens if the client encoding is *not* UTF8? > > If not UTF8, we don't do anything. I think it is sufficient that we simply > require people to use UTF8 as their client_encoding if they want DBD::Pg > to do the right thing. It's very common, and more importantly, is the only > encoding guaranteed to auto convert from any server encoding.
It would be worth mentioning the PGCLIENTENCODING env var in the docs, and the fact that it can be set to "auto" to "determine the right encoding from the current locale in the client (LC_CTYPE environment variable on Unix systems)." > > Will it turn on the flag for all data without regard to type? > > Yes. The doc says "for all strings coming back" which is possibly a little ambiguous. (After poking about in the code and libpq docs I'm wondering if PQfformat() should be used to confirm that a field is "textual" before applying SvUTF8_on.) Ideally the docs would have a section on Unicode that discusses it in relation to SQL statements, placeholders, attributes (like $sth->{NAME}), array stringification and error messages. I.e. at least all the places that have SvUTF8/_on/_off() calls, plus anywhere that character data gets passed into libpq. > > This looks like a good compromise to me: setting it to a boolean retains > > the previous behavior (more or less, unless setting it to 1 still converts > > it for specific types), and the new default is much saner (assuming > > that it applies to *all* types). > > ... > A lot of this is not going to have any perfect answers, especially as far > as backwards compatibilty goes, and forward compatibility with DBI > support. But we need to get moving, and I think this is a pretty good > first effort. I agree, and I'm delighted to this. I would urge you to implement good test coverage for unicode support. We found all sorts of issues while implementing it for DBD::Oracle a few years ago (including several bugs in Oracle). There are some good unicode stress tests in DBD::Oracle. See https://metacpan.org/source/PYTHIAN/DBD-Oracle-1.64/t/nchar_test_lib.pl as used by https://metacpan.org/source/PYTHIAN/DBD-Oracle-1.64/t/22nchar_utf8.t and https://metacpan.org/source/PYTHIAN/DBD-Oracle-1.64/t/23wide_db_al32utf8.t A key part of that is the use of DUMP() to verify that the server itself has the right representation. Otherwise it's possible to have cases where characters go in and come back as UTF8, so all seems fine, but the server doesn't interpret the stored value as the same characters. Something that returns _character_ length would probably suffice. That's possibly less of an issue for postgres, but I'd recommend it. The unicode docs in DBD::Oracle mainly talk about edge cases https://metacpan.org/module/DBD::Oracle#UNICODE but there might be some useful notes. I'd also recommend using the data_string_desc, data_string_diff and data_diff functions https://metacpan.org/module/DBI#data_string_desc I wrote them for my own sanity while working on unicode support in DBD::Oracle and they proved very useful. (It's easy to be fooled when working with UTF8.) Tim. p.s. How can I subscribe to the commits mailing list?