Re: DBI and character sets (yet again)

Michael Peppler Mon, 22 Mar 2004 07:05:29 -0800

On Mon, 2004-03-22 at 03:04, Tim Bunce wrote:
> On Sun, Mar 21, 2004 at 04:50:34PM -0800, Dean Arnold wrote:
> > > 
> > > > If a list of charset behaviors for each DBD is needed,
> > > > I'd be happy to put one together, assuming the DBD authors
> > > > send me the details for each driver.
> > > 
> > > That would be great.
> > 
> > OK. Shall we start w/ DBD::Oracle ? ;^)
> 
> You could, but that's very much a moving target at the moment.
> 
> > And driver authors, feel free to forward to me (and/or thlis
> > list). I'll try to put together a little webpage with the info.
> 
> I think it would help if you formulated a set of questions for driver
> authors (or anyone else) to answer. Especially as finding the right
> questions can be harder than finding the answers.
> 
> Here are a few to get you started:


For Sybase ASE (and DBD::Sybase)

>  - Does the database:
>       - have any concept of national character sets?

ASE has a concept of locales, with a mapping from the locale to a
character set.

>       - at what levels: database, table, field?

server.

>       - url for list of character set names?
>       - does it support unicode?

Yes.

>  - Does the database client API:
>       - provide access to character set information, and how?

Yes, in the connection properties.

>       - at what levels: database, table, field?

Server (i.e. connection).

>       - does it have a concept of a client character set?

Yes.

>       - how is the client charset determined (locale, env var etc)

locale/env var (LC_ALL/LANG), but can be overridden via connection
properties.

>       - does it perform charset recoding?

Yes, if possible.

>  - Does the DBD driver:
>       - (repeat last set of questions)

DBD::Sybase will honor the current locale as that is the default
behavior of Sybase OpenClient, and you can override the client charset
in the DBI DSN as needed.

> > Presumably,
> > just another bit of $sth metadata, e.g., $sth->{CHAR_SET}, to provide
> > the info. If the driver doesn't know, then it fills in with undef, and
> > the app is on its own. Otherwise, the app has enough info to make
> > the necessary conversion:
> 
> You're presuming that all database that support charsets will use
> the same set of names as Encode uses. I hope that is the case but
> it might not be. (Add that to your list of things to discover :)

ASE uses "iso_1", "cp850", 'sjis", "eucjis", "eucgb", euccns, big5,
utf8, roman8, roman9, cp437, gb18030, eucksc and a few others that I've
probably missed. The charset names depend on the platform (i.e. Win32
has a different set of charset names than, say, linux or VMS).

FWIW... :-)

Michael
-- 
Michael Peppler                              Data Migrations, Inc.
[EMAIL PROTECTED]                       http://www.peppler.org/
Sybase T-SQL/OpenClient/OpenServer/C/Perl developer available for short or 
long term contract positions - http://www.peppler.org/resume.html

Re: DBI and character sets (yet again)

Reply via email to