Re: Add Unicode Support to the DBI

2011-11-09 Thread H.Merijn Brand
On Tue, 08 Nov 2011 21:12:13 +, Martin J. Evans martin.ev...@easysoft.com wrote: I've just checked in unicode_test.pl to DBI's subversion trunk in /ex dir. It won't run right now without changing the do_connect sub as you have to specify how to connect to the DB. Also, there is a DBD

Re: Add Unicode Support to the DBI

2011-11-09 Thread H.Merijn Brand
On Tue, 08 Nov 2011 21:12:13 +, Martin J. Evans martin.ev...@easysoft.com wrote: I've just checked in unicode_test.pl to DBI's subversion trunk in /ex dir. So now attached -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using 5.00307 through 5.14 and porting

Re: Add Unicode Support to the DBI

2011-11-09 Thread Tim Bunce
On Wed, Nov 09, 2011 at 04:50:29PM +0100, H.Merijn Brand wrote: On Tue, 08 Nov 2011 21:12:13 +, Martin J. Evans martin.ev...@easysoft.com wrote: I've just checked in unicode_test.pl to DBI's subversion trunk in /ex dir. So now attached Any chance you could rework your changes into

Re: Add Unicode Support to the DBI

2011-11-09 Thread H.Merijn Brand
On Wed, 9 Nov 2011 16:23:53 +, Tim Bunce tim.bu...@pobox.com wrote: On Wed, Nov 09, 2011 at 04:50:29PM +0100, H.Merijn Brand wrote: On Tue, 08 Nov 2011 21:12:13 +, Martin J. Evans martin.ev...@easysoft.com wrote: I've just checked in unicode_test.pl to DBI's subversion trunk in

Re: Add Unicode Support to the DBI

2011-11-09 Thread Martin J. Evans
On 09/11/2011 15:49, H.Merijn Brand wrote: On Tue, 08 Nov 2011 21:12:13 +, Martin J. Evans martin.ev...@easysoft.com wrote: I've just checked in unicode_test.pl to DBI's subversion trunk in /ex dir. It won't run right now without changing the do_connect sub as you have to specify how to

Re: Add Unicode Support to the DBI

2011-11-09 Thread H.Merijn Brand
On Wed, 09 Nov 2011 19:41:33 +, Martin J. Evans martin.ev...@easysoft.com wrote: Your going to have a lot of problems with this test code and DBD::Unify as we previously discovered that DBD::Unify does not decode the data coming back from the database itself but it can be decoded by any

Re: Add Unicode Support to the DBI

2011-11-08 Thread Tim Bunce
On Mon, Nov 07, 2011 at 01:37:38PM +, Martin J. Evans wrote: I didn't think I was going to make LPW but it seems I will now - although it has cost me big time leaving it until the last minute. All your beers at LPW are on me! http://www.martin-evans.me.uk/node/121 Great work Martin.

Re: Add Unicode Support to the DBI

2011-11-08 Thread H.Merijn Brand
On Tue, 8 Nov 2011 13:16:17 +, Tim Bunce tim.bu...@pobox.com wrote: On Mon, Nov 07, 2011 at 01:37:38PM +, Martin J. Evans wrote: I didn't think I was going to make LPW but it seems I will now - although it has cost me big time leaving it until the last minute. All your

Re: Add Unicode Support to the DBI

2011-11-08 Thread Martin J. Evans
On 08/11/11 13:16, Tim Bunce wrote: On Mon, Nov 07, 2011 at 01:37:38PM +, Martin J. Evans wrote: I didn't think I was going to make LPW but it seems I will now - although it has cost me big time leaving it until the last minute. All your beers at LPW are on me!

Re: Add Unicode Support to the DBI

2011-11-08 Thread Tim Bunce
On Tue, Nov 08, 2011 at 02:45:39PM +, Martin J. Evans wrote: On 08/11/11 13:16, Tim Bunce wrote: On Mon, Nov 07, 2011 at 01:37:38PM +, Martin J. Evans wrote: 2. Try to make a data-driven common test script. There is already one attached to the bottom of the post and referred to in

Re: Add Unicode Support to the DBI

2011-11-08 Thread David E. Wheeler
On Nov 8, 2011, at 5:16 AM, Tim Bunce wrote: 1. Focus initially on categorising the capabilities of the databases. Specifically separating those that understand character encodings at one or more of column, table, schema, database level. Answer the questions: what Unicode

Re: Add Unicode Support to the DBI

2011-11-08 Thread Martin J. Evans
On 08/11/2011 13:16, Tim Bunce wrote: On Mon, Nov 07, 2011 at 01:37:38PM +, Martin J. Evans wrote: I didn't think I was going to make LPW but it seems I will now - although it has cost me big time leaving it until the last minute. All your beers at LPW are on me!

Re: Add Unicode Support to the DBI

2011-11-07 Thread Martin J. Evans
On 04/11/11 08:39, Martin J. Evans wrote: On 03/11/11 23:25, David E. Wheeler wrote: On Oct 7, 2011, at 5:06 PM, David E. Wheeler wrote: Perhaps we could carve out some time at LPW to sit together and try to progress this. That would be awesome you guys! So gents, do you plan to do this a

Re: Add Unicode Support to the DBI

2011-11-06 Thread Martin J. Evans
On 05/10/2011 00:06, Jonathan Leffler wrote: On Tue, Oct 4, 2011 at 15:24, Martin J. Evansmartin.ev...@easysoft.comwrote: On 04/10/2011 22:38, Tim Bunce wrote: I've not had time to devote to this thread. Sorry. I'd be grateful if someone could post a summary of it if/when it approaches some

Re: Add Unicode Support to the DBI

2011-11-04 Thread Martin J. Evans
On 03/11/11 23:25, David E. Wheeler wrote: On Oct 7, 2011, at 5:06 PM, David E. Wheeler wrote: Perhaps we could carve out some time at LPW to sit together and try to progress this. That would be awesome you guys! So gents, do you plan to do this a bit? Martin, do you have the data you

Re: Add Unicode Support to the DBI

2011-11-04 Thread David E. Wheeler
On Nov 4, 2011, at 1:39 AM, Martin J. Evans wrote: Sorry David, I've been snowed under. I will try very hard to publish the research I found this weekend. Awesome, thanks. Did you ever get any data from DBD::SQLite folks? I didn't think I was going to make LPW but it seems I will now -

Re: Add Unicode Support to the DBI

2011-11-04 Thread Martin J. Evans
On 04/11/11 16:39, David E. Wheeler wrote: On Nov 4, 2011, at 1:39 AM, Martin J. Evans wrote: Sorry David, I've been snowed under. I will try very hard to publish the research I found this weekend. Awesome, thanks. Did you ever get any data from DBD::SQLite folks? Yes. I found a bug in

Re: Add Unicode Support to the DBI

2011-11-04 Thread David E. Wheeler
On Nov 4, 2011, at 10:33 AM, Martin J. Evans wrote: Did you ever get any data from DBD::SQLite folks? Yes. I found a bug in the process and it was fixed but I have a working SQLite example. Oh, great. I'm only really missing DB2 but I have contacts for that on #dbix-class who I've just

Re: Add Unicode Support to the DBI

2011-11-03 Thread David E. Wheeler
On Oct 7, 2011, at 5:06 PM, David E. Wheeler wrote: Perhaps we could carve out some time at LPW to sit together and try to progress this. That would be awesome you guys! So gents, do you plan to do this a bit? Martin, do you have the data you wanted to collect on this? Thanks, David

Re: Add Unicode Support to the DBI

2011-10-13 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 David E. Wheeler wrote: I think what I haven't said is that we should just use the same names that Perl I/O uses. Er, well, for the :raw and :utf8 varieties I was, anyway. Perhaps we should adopt it wholesale, so you'd use

Re: Add Unicode Support to the DBI

2011-10-13 Thread David E. Wheeler
On Oct 13, 2011, at 6:03 AM, Greg Sabino Mullane wrote: I think what I haven't said is that we should just use the same names that Perl I/O uses. Er, well, for the :raw and :utf8 varieties I was, anyway. Perhaps we should adopt it wholesale, so you'd use :encoding(UTF-8) instead of UTF-8.

Re: Add Unicode Support to the DBI

2011-10-07 Thread Tim Bunce
On Tue, Oct 04, 2011 at 11:24:51PM +0100, Martin J. Evans wrote: On 04/10/2011 22:38, Tim Bunce wrote: I've not had time to devote to this thread. Sorry. I'd be grateful if someone could post a summary of it if/when it approaches some kind of consensus. I don't think there is a kind of

Re: Add Unicode Support to the DBI

2011-10-07 Thread David E. Wheeler
On Oct 7, 2011, at 1:47 AM, Tim Bunce wrote: Perhaps we could carve out some time at LPW to sit together and try to progress this. That would be awesome you guys! D

Re: Add Unicode Support to the DBI

2011-10-06 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Uh, say what? Just as I need to binmode STDOUT, ':utf8'; Before sending stuff to STDOUT (that is, turn off the flag), I would expect DBDs to do the same before sending data to the database. Unless, of course, it just works. I cannot

Re: Add Unicode Support to the DBI

2011-10-05 Thread H.Merijn Brand
On Tue, 04 Oct 2011 23:24:51 +0100, Martin J. Evans martin.ev...@easysoft.com wrote: Some might disagree but DB2 is a main one I no longer have access to (please contact me if you use DBD::DB2 and are prepared to spare half an hour or so to modify examples I have which verify unicode

Re: Add Unicode Support to the DBI

2011-10-04 Thread Tim Bunce
I've not had time to devote to this thread. Sorry. I'd be grateful if someone could post a summary of it if/when it approaches some kind of consensus. Thanks. Tim.

Re: Add Unicode Support to the DBI

2011-10-04 Thread Martin J. Evans
On 04/10/2011 22:38, Tim Bunce wrote: I've not had time to devote to this thread. Sorry. I'd be grateful if someone could post a summary of it if/when it approaches some kind of consensus. Thanks. Tim. I don't think there is a kind of consensus right now (although some useful discussion

Re: Add Unicode Support to the DBI

2011-10-04 Thread Jonathan Leffler
On Tue, Oct 4, 2011 at 15:24, Martin J. Evans martin.ev...@easysoft.comwrote: On 04/10/2011 22:38, Tim Bunce wrote: I've not had time to devote to this thread. Sorry. I'd be grateful if someone could post a summary of it if/when it approaches some kind of consensus. I don't think there

Re: Add Unicode Support to the DBI

2011-10-03 Thread David E . Wheeler
On Oct 2, 2011, at 8:49 PM, Greg Sabino Mullane wrote: DEW I assume you also mean to say that data sent *to* the database DEW has the flag turned off, yes? No: that is undefined. I don't see it as the DBDs job to massage data going into the database. Or at least, I cannot imagine a DBI

Re: Add Unicode Support to the DBI

2011-10-02 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 From: David E. Wheeler da...@kineticode.com GSM * $h-{unicode_flag} GSM If this is set on, data returned from the database is assumed to be UTF-8, and GSM the utf8 flag will be set. DEW I assume you also mean to say that data sent *to* the

Re: Add Unicode Support to the DBI

2011-09-22 Thread Martin J. Evans
On 21/09/11 21:52, Greg Sabino Mullane wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 ... And maybe that's the default. But I should be able to tell it to be pedantic when the data is known to be bad (see, for example data from an SQL_ASCII-encoded PostgreSQL database). ...

Re: Add Unicode Support to the DBI

2011-09-22 Thread Martin J. Evans
David, I forgot to answer your post first and ended up putting most of my comments in a reply to Greg's posting - sorry, it was a long night last night. Some further comments below: On 21/09/11 19:44, David E. Wheeler wrote: On Sep 10, 2011, at 3:08 AM, Martin J. Evans wrote: I'm not sure

Re: Add Unicode Support to the DBI

2011-09-22 Thread Martin J. Evans
On 22/09/2011 17:36, David E. Wheeler wrote: On Sep 22, 2011, at 2:26 AM, Martin J. Evans wrote: There is more than one way to encode unicode - not everyone uses UTF-8; although some encodings don't support all of unicode. Yeah, maybe should be utf8_flag instead. see below. unicode is not

Re: Add Unicode Support to the DBI

2011-09-22 Thread David E. Wheeler
On Sep 22, 2011, at 11:14 AM, Martin J. Evans wrote: Right. There needs to be a way to tell the DBI what encoding the server sends and expects to be sent. If it's not UTF-8, then the utf8_flag option is kind of useless. I think this was my point above, i.e., why utf8? databases accept and

Re: Add Unicode Support to the DBI

2011-09-22 Thread David E. Wheeler
On Sep 22, 2011, at 11:57 AM, Martin J. Evans wrote: ok except what the oracle client libraries accept does not match with Encode accepted strings so someone would have to come up with some sort of mapping between the two. Yes. That's one of the consequences of providing a single interface

Re: Add Unicode Support to the DBI

2011-09-21 Thread David E. Wheeler
DBI peeps, Sorry for the delayed response, I've been busy, looking to reply to this thread now. On Sep 9, 2011, at 8:06 PM, Greg Sabino Mullane wrote: One thing I see bandied about a lot is that Perl 5.14 is highly preferred. However, it's not clear exactly what the gains are and how bad

Re: Add Unicode Support to the DBI

2011-09-21 Thread David E. Wheeler
On Sep 10, 2011, at 7:44 AM, Lyle wrote: Right now 5.8 is the required minimum for DBI: should we consider bumping this? I know a lot of servers in the wild are still running RHEL5 and it's variants, which are stuck on 5.8 in the standard package management. The new RHEL6 only has

Re: Add Unicode Support to the DBI

2011-09-21 Thread David E. Wheeler
On Sep 10, 2011, at 3:08 AM, Martin J. Evans wrote: I'm not sure any change is required to DBI to support unicode. As far as I'm aware unicode already works with DBI if the DBDs do the right thing. Right, but the problem is that, IME, none of them do the right thing. As I said, I've

Re: Add Unicode Support to the DBI

2011-09-21 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 ... And maybe that's the default. But I should be able to tell it to be pedantic when the data is known to be bad (see, for example data from an SQL_ASCII-encoded PostgreSQL database). ... DBD::Pg's approach is currently broken. Greg is

Re: Add Unicode Support to the DBI

2011-09-21 Thread David E. Wheeler
On Sep 21, 2011, at 1:52 PM, Greg Sabino Mullane wrote: Since nobody has actally defined a specific interface yet, let me throw out a straw man. It may look familiar :) === * $h-{unicode_flag} If this is set on, data returned from the database is assumed to be UTF-8, and the utf8

Re: Add Unicode Support to the DBI

2011-09-10 Thread H.Merijn Brand
On Sat, 10 Sep 2011 03:06:49 -, Greg Sabino Mullane g...@turnstep.com wrote: One thing I see bandied about a lot is that Perl 5.14 is highly preferred. However, it's not clear exactly what the gains are and how bad 5.12 is compared to 5.14, how bad 5.10 is, how bad 5.8 is, etc. Right now

Re: Add Unicode Support to the DBI

2011-09-10 Thread Martin J. Evans
On 10/09/2011 03:52, David E. Wheeler wrote: DBIers, tl;dr: I think it's time to add proper Unicode support to the DBI. What do you think it should look like? I'm not sure any change is required to DBI to support unicode. As far as I'm aware unicode already works with DBI if the DBDs do

Re: Add Unicode Support to the DBI

2011-09-10 Thread Lyle
On 10/09/2011 04:06, Greg Sabino Mullane wrote: Right now 5.8 is the required minimum for DBI: should we consider bumping this? I know a lot of servers in the wild are still running RHEL5 and it's variants, which are stuck on 5.8 in the standard package management. The new RHEL6 only has

Add Unicode Support to the DBI

2011-09-09 Thread David E. Wheeler
DBIers, tl;dr: I think it's time to add proper Unicode support to the DBI. What do you think it should look like? Background I've brought this up a time or two in the past, but a number of things have happened lately to make me think that it was again time: First, on the DBD::Pg list, we've

Re: Add Unicode Support to the DBI

2011-09-09 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 One thing I see bandied about a lot is that Perl 5.14 is highly preferred. However, it's not clear exactly what the gains are and how bad 5.12 is compared to 5.14, how bad 5.10 is, how bad 5.8 is, etc. Right now 5.8 is the required minimum

Re: Add Unicode Support to the DBI

2011-09-09 Thread Darren Duncan
Another wrinkle to this is the fact that identifiers in the database, such as column names and such, are also character data, and have an encoding. So for any DBMSs that support Unicode identifiers (as I believe a complete one should, even if they have to be quoted in SQL) or identifiers with