Re: DBI and character sets (yet again)

Tim Bunce Sun, 21 Mar 2004 15:19:38 -0800

On Sun, Mar 21, 2004 at 01:10:27PM -0800, Dean Arnold wrote:
> (Note: I'm sending this to both -users and -dev, I'm not
> certain which it belongs to at this point)


dbi-user I think, at this point, as wide user comment may be helpful.
Though I might regret that if this produces more heat than light.
[I've removed dev-dev from the CC list.  Anyone else replying to
(replies to) the original please do the same. Thanks.]

> Is there a consistent charset encoding behavior defined for
> DBI at this time ?

No.

> If not, is a rule wrt charset encoding behavior needed ? 

Yes.

> If a list of charset behaviors for each DBD is needed,
> I'd be happy to put one together, assuming the DBD authors
> send me the details for each driver.

That would be great.


I'm not expert on this, as I'm probably about to prove, but here's
my perspective, for today at least...

1. Most applications only work with one character set encoding
   (not counting UTF8). Obvious example: Latin-1.

2. Unicode is where we're going. Get used to it.

3. I don't really want the DBI to be involved in any recoding
   of character sets (from client charset to server charset)
   and I suggest that the drivers don't try to do that either.

4. DBI v2 will provide hooks to allow callbacks to be fired
   on fetching a field and/or row and that could be used by an
   application for recoding if it wants to 'hide' it under the DBI.

5. When selecting data from the database the driver should:
   - return strings which have a unicode character set as UTF8.
   - return strings with other character sets as-is (unchanged) on
     the presumption that the application knows what to do with it.

6. Drivers that want to can offer a mechanism to recode non-unicode
   character sets into unicode but I don't see a big need for the
   DBI to standardize an interface for that at the moment.

7. DBI v2 will probably provide a way for applications to force the
   UTF8 flag on particular columns as a workaround for drivers that
   don't know the string of bytes they're returing is actually UTF8.

8. When passing data to the database (including the SQL statement)
   the driver should (perhaps) warn if it's presented with UTF8
   strings but the database or database can't handle unicode.

Comments welcome, of course, but please stick to practical issues,
ideally with examples, rather than theoretical ones. Thanks.

Tim.

Re: DBI and character sets (yet again)

Reply via email to