On Sun, Dec 15, 2002 at 10:16:47PM -0800, Jonathan Leffler wrote: > Dear Tim (and others who've worked on the metadata methods), > > I'd like to check on my understanding of what's required from the > metadata methods such as $dbh->tables. Specifically, there are three > distinct possible ways in which a tablename (for sake of example) can > be used: > > 1. In an SQL statement - eg the FROM clause of a SELECT statement. > 2. In a catalog query, as a placeholder value. > 3. In a catalog query, as a string literal. > > Further, each of those representations can be different. To take an > extreme example: > > 1. FROM "I said, ""Don't!""" > 2. I said, "Don't!" > 3. 'I said, "Don''t!"' > > Although I haven't actually created this particular table name in > Informix, I've created equivalent ones -- they routinely break > software that is not very carefully designed. > > Which version of this name should methods such as $dbh->tables return?
Simple: Just return the tru name of the table (2, in your example). The code that _uses_ the name needs to look after any quoting etc it needs. > I assume that the correct option is 2, not least because that is also > the format that is stored in the system catalog. Yes. > The $dbh->quote method will convert from notation 2 to notation 3. Yes. > AFAIK, there is no DBI standard method to convert from notation 2 to > notation 1 -- is that correct? Should there be? $dbh->identifier? You must have missed $dbh->quote_identifier being added to DBI 1.21. > I need to know because Informix provides me with a problem; it only > recognizes delimited identifiers (notation 1) when a specific > environment variable (it happens to be $ENV{DELIMIDENT}) is set. For > 99.99% of people, this doesn't matter. They don't have table names > that require delimiting -- not least because Informix is remarkably > good at disentangling CREATE TABLE TABLE(NULL CHAR(2) NOT NULL, INT > DATE NOT NULL, DATE DECIMAL NOT NULL); SQL-92 would require CREATE > TABLE "TABLE"("NULL" CHAR(2) NOT NULL, "INT" DATE NOT NULL, "DATE" > DECIMAL NOT NULL). However, if you do not have $ENV{DELIMIDENT} set, > then it is not safe to use double quotes around identifiers; they will > be misinterpreted as strings (for reasons of pre-standard (pre-SQL-86) > ancient history). As long as the names are usable as C identifiers, > there is no need to treat the names specially. > > So, a simple-minded process that just encloses table names in double > quotes is no good; it will break most Informix code. The schema name > (referred to as owner in Informix parlance) can be enclosed in single > or double quotes without causing much trouble, as it happens. Column > names are subject to the same whimsical treatment as table names. > > In released versions of DBD::Informix, this is handled incompletely; > it sort of implements a very old version of the specification, and > attempts to return most values in notation 1 (but doesn't do a very > good job of that, either). I'm wondering though what the correct > implementation should do. Can the DBI standard implementation of > tables handle this? Newer versions are better. See the code. > What information does it need from $dbh->get_info to handle this? Can > it really handle all the vicissitudes of Informix's peculiar notions. Probably not. See the code: sub tables { my ($dbh, @args) = @_; my $sth = $dbh->table_info(@args) or return; my $tables = $sth->fetchall_arrayref or return; my @tables; if ($dbh->get_info(29)) { # SQL_IDENTIFIER_QUOTE_CHAR @tables = map { $dbh->quote_identifier( @{$_}[0,1,2] ) } @$tables; } else { # temporary old style hack (yeach) @tables = map { my $name = $_->[2]; if ($_->[1]) { my $schema = $_->[1]; # a sad hack (mostly for Informix I recall) my $quote = ($schema eq uc($schema)) ? '' : '"'; $name = "$quote$schema$quote.$name" } $name; } @$tables; } return @tables; } sub quote_identifier { my ($dbh, @id) = @_; my $attr = (@id > 3) ? pop @id : undef; my $info = $dbh->{dbi_quote_identifier_cache} ||= [ $dbh->get_info(29) || '"', # SQL_IDENTIFIER_QUOTE_CHAR $dbh->get_info(41) || '.', # SQL_CATALOG_NAME_SEPARATOR $dbh->get_info(114) || 1, # SQL_CATALOG_LOCATION ]; my $quote = $info->[0]; foreach (@id) { # quote the elements next unless defined; s/$quote/$quote$quote/g; # escape embedded quotes $_ = qq{$quote$_$quote}; } # strip out catalog if present for special handling my $catalog = (@id >= 3) ? shift @id : undef; # join the dots, ignoring any null/undef elements (ie schema) my $quoted_id = join '.', grep { defined } @id; if ($catalog) { # add catalog correctly $quoted_id = ($info->[2] == 2) # SQL_CL_END ? $quoted_id . $info->[1] . $catalog : $catalog . $info->[1] . $quoted_id; } return $quoted_id; } > I'm trying to work out exactly what in the Informix world correlates > with a catalog -- it could be database name without server, or it > could be database name plus server -- eg dbase vs dbase@server -- or > it could be irrelevant. Few databases support catalogs. Though I think some simple ones like ODBC CSV drivers use it for the directory the files are in. > That mainly has implications for how > extensive a set of queries are needed to resolve a really general > $dbh->tables query. It is far easier to regard it as irrelevant (as > the DBI specification suggests) than not. I think this is mainly an > internal issue (to DBD::Informix) which will be resolved in favour of > simplicity of implementation over thoroughness of answer -- partly on > grounds of performance, partly on grounds of usefulness, and mainly > because it is hard to do the job right (I'd have to filter the list of > databases generated by a non-SQL method with a simulation of the LIKE > procedure -- which granted isn't too hard: LIKE "foo_bar%x" > corresponds to m/^foo.bar.*x$/). In general you whould do whatever Informix's own ODBC driver does. Remember the principle that it ought to be possible to interchange the use of DBD::Foo with DBD::ODBC using Foo's ODBC driver, with no changes to the application. > An additional problem in this area is that the full notation for a > column in Informix is: > > dbase@server:owner.table.column > > The database name is constrained to be equivalent to a C identifier > (up to 128 characters, though), and the server name is likewise > constrained. For those components, quotes are neither necessary nor > acceptable. The server name is optional and when omitted it is the > '@' that vanishes, leaving dbase:owner.table.column. If the database > is the current database, then the 'dbase:' part can be omitted. The > owner is optional in most types of Informix database regardless of the > owner (schema) to which it belongs - but in a MODE ANSI database, it > is necessary if you do not own the table. Oh, and if the column is a > row type, it is possible to address elements of the row using one or > more extra levels of ".identifier". > > And there are still more gotcha's. In general, if the name is not > quoted, it is case-converted to lower case. Exception: unquoted owner > names in a MODE ANSI database are converted to upper case; exception > to the exception: the unquoted owner name informix is not converted to > upper case. If a table name is a delimited identifier, it is > case-sensitive; otherwise it is not. Ditto for column names. OTOH, > upper case in database or server names is not supported at all for all > practical purposes. > > I'm also wondering how this affects the $dbh->get_info() method. I'm > guessing that if DELIMIDENT is set, then it is OK to say 'the quote > character for delimited identifiers is double-quote'; I'm wondering > what I should say when DELIMIDENT is not set -- undef is very > tempting, but is it correct? (Actually, my life is really contorted; > if the Informix database server is started with DELIMIDENT set in its > environment, it won't matter whether the client ever sets the > environment or not -- that server is running with delimited > identifiers enabled. But there isn't an easy way to find out whether > this obscure circumstance applies other than by trying to use a > delimited identifier and seeing whether you get an error or not.) > > Another related question: how does $dbh->tables('%','','') compare > with DBI->data_sources()? Since this is explicitly called out as a > special case in $dbh->tables(), I presume it is different from what > DBI->data_sources() is expected to return - but how? Or is it a > feature that actually should be omitted from the non-experimental > specification of $dbh->tables()? See the DBI docs. data_sources is about DBI DSN's. > Now you know why I'm balding! Yeap. I don't envy you task. You're life will be easier if you get the latest Informix ODBC driver and the latest DBD::ODBC (development release I think) and see how they behave. The implementors of Informix's ODBC driver must have faced many of these issues and addressed them in an officially acepted way. Tim.