On Sun, Dec 15, 2002 at 10:16:47PM -0800, Jonathan Leffler wrote:
> Dear Tim (and others who've worked on the metadata methods),
> 
> I'd like to check on my understanding of what's required from the
> metadata methods such as $dbh->tables. Specifically, there are three
> distinct possible ways in which a tablename (for sake of example) can
> be used:
> 
> 1. In an SQL statement - eg the FROM clause of a SELECT statement.
> 2. In a catalog query, as a placeholder value.
> 3. In a catalog query, as a string literal.
> 
> Further, each of those representations can be different.  To take an 
> extreme example:
> 
> 1.    FROM "I said, ""Don't!"""
> 2.    I said, "Don't!"
> 3.    'I said, "Don''t!"'
> 
> Although I haven't actually created this particular table name in 
> Informix, I've created equivalent ones -- they routinely break 
> software that is not very carefully designed.
> 
> Which version of this name should methods such as $dbh->tables return?

Simple: Just return the tru name of the table (2, in your example).
The code that _uses_ the name needs to look after any quoting etc it needs.

> I assume that the correct option is 2, not least because that is also 
> the format that is stored in the system catalog.

Yes.

> The $dbh->quote method will convert from notation 2 to notation 3.

Yes.

> AFAIK, there is no DBI standard method to convert from notation 2 to 
> notation 1 -- is that correct?  Should there be?  $dbh->identifier?

You must have missed $dbh->quote_identifier being added to DBI 1.21.

> I need to know because Informix provides me with a problem; it only 
> recognizes delimited identifiers (notation 1) when a specific 
> environment variable (it happens to be $ENV{DELIMIDENT}) is set.  For 
> 99.99% of people, this doesn't matter.  They don't have table names 
> that require delimiting -- not least because Informix is remarkably 
> good at disentangling CREATE TABLE TABLE(NULL CHAR(2) NOT NULL, INT 
> DATE NOT NULL, DATE DECIMAL NOT NULL); SQL-92 would require CREATE 
> TABLE "TABLE"("NULL" CHAR(2) NOT NULL, "INT" DATE NOT NULL, "DATE" 
> DECIMAL NOT NULL).  However, if you do not have $ENV{DELIMIDENT} set, 
> then it is not safe to use double quotes around identifiers; they will 
> be misinterpreted as strings (for reasons of pre-standard (pre-SQL-86) 
> ancient history).  As long as the names are usable as C identifiers, 
> there is no need to treat the names specially.
> 
> So, a simple-minded process that just encloses table names in double 
> quotes is no good; it will break most Informix code.  The schema name 
> (referred to as owner in Informix parlance) can be enclosed in single 
> or double quotes without causing much trouble, as it happens.  Column 
> names are subject to the same whimsical treatment as table names.
> 
> In released versions of DBD::Informix, this is handled incompletely; 
> it sort of implements a very old version of the specification, and 
> attempts to return most values in notation 1 (but doesn't do a very 
> good job of that, either).  I'm wondering though what the correct 
> implementation should do.  Can the DBI standard implementation of 
> tables handle this?

Newer versions are better. See the code.

> What information does it need from $dbh->get_info to handle this?  Can 
> it really handle all the vicissitudes of Informix's peculiar notions.

Probably not. See the code:

    sub tables {
        my ($dbh, @args) = @_;
        my $sth    = $dbh->table_info(@args) or return;
        my $tables = $sth->fetchall_arrayref or return;
        my @tables;
        if ($dbh->get_info(29)) { # SQL_IDENTIFIER_QUOTE_CHAR
            @tables = map { $dbh->quote_identifier( @{$_}[0,1,2] ) } @$tables;
        }
        else {          # temporary old style hack (yeach)
            @tables = map {
                my $name = $_->[2];
                if ($_->[1]) {
                    my $schema = $_->[1];
                    # a sad hack (mostly for Informix I recall)
                    my $quote = ($schema eq uc($schema)) ? '' : '"';
                    $name = "$quote$schema$quote.$name"
                }
                $name;
            } @$tables;
        }
        return @tables;
    }

    sub quote_identifier {
        my ($dbh, @id) = @_;
        my $attr = (@id > 3) ? pop @id : undef;
 
        my $info = $dbh->{dbi_quote_identifier_cache} ||= [
            $dbh->get_info(29)  || '"', # SQL_IDENTIFIER_QUOTE_CHAR
            $dbh->get_info(41)  || '.', # SQL_CATALOG_NAME_SEPARATOR
            $dbh->get_info(114) ||   1, # SQL_CATALOG_LOCATION
        ];
 
        my $quote = $info->[0];
        foreach (@id) {                 # quote the elements
            next unless defined;
            s/$quote/$quote$quote/g;    # escape embedded quotes
            $_ = qq{$quote$_$quote};
        }
 
        # strip out catalog if present for special handling
        my $catalog = (@id >= 3) ? shift @id : undef;
 
        # join the dots, ignoring any null/undef elements (ie schema)
        my $quoted_id = join '.', grep { defined } @id;
 
        if ($catalog) {                 # add catalog correctly
            $quoted_id = ($info->[2] == 2)      # SQL_CL_END
                    ? $quoted_id . $info->[1] . $catalog
                    : $catalog   . $info->[1] . $quoted_id;
        }
        return $quoted_id;
    }

> I'm trying to work out exactly what in the Informix world correlates 
> with a catalog -- it could be database name without server, or it 
> could be database name plus server -- eg dbase vs dbase@server  -- or 
> it could be irrelevant.

Few databases support catalogs. Though I think some simple ones
like ODBC CSV drivers use it for the directory the files are in.

> That mainly has implications for how 
> extensive a set of queries are needed to resolve a really general 
> $dbh->tables query.  It is far easier to regard it as irrelevant (as 
> the DBI specification suggests) than not.  I think this is mainly an 
> internal issue (to DBD::Informix) which will be resolved in favour of 
> simplicity of implementation over thoroughness of answer -- partly on 
> grounds of performance, partly on grounds of usefulness, and mainly 
> because it is hard to do the job right (I'd have to filter the list of 
> databases generated by a non-SQL method with a simulation of the LIKE 
> procedure -- which granted isn't too hard: LIKE "foo_bar%x" 
> corresponds to m/^foo.bar.*x$/).

In general you whould do whatever Informix's own ODBC driver does.

Remember the principle that it ought to be possible to interchange
the use of DBD::Foo with DBD::ODBC using Foo's ODBC driver, with no
changes to the application.

> An additional problem in this area is that the full notation for a 
> column in Informix is:
> 
>       dbase@server:owner.table.column
> 
> The database name is constrained to be equivalent to a C identifier 
> (up to 128 characters, though), and the server name is likewise 
> constrained.  For those components, quotes are neither necessary nor 
> acceptable.  The server name is optional and when omitted it is the 
> '@' that vanishes, leaving dbase:owner.table.column.  If the database 
> is the current database, then the 'dbase:' part can be omitted.  The 
> owner is optional in most types of Informix database regardless of the 
> owner (schema) to which it belongs - but in a MODE ANSI database, it 
> is necessary if you do not own the table.  Oh, and if the column is a 
> row type, it is possible to address elements of the row using one or 
> more extra levels of ".identifier".
> 
> And there are still more gotcha's.  In general, if the name is not 
> quoted, it is case-converted to lower case.  Exception: unquoted owner 
> names in a MODE ANSI database are converted to upper case; exception 
> to the exception: the unquoted owner name informix is not converted to 
> upper case.  If a table name is a delimited identifier, it is 
> case-sensitive; otherwise it is not.  Ditto for column names.  OTOH, 
> upper case in database or server names is not supported at all for all 
> practical purposes.
> 
> I'm also wondering how this affects the $dbh->get_info() method.  I'm 
> guessing that if DELIMIDENT is set, then it is OK to say 'the quote 
> character for delimited identifiers is double-quote'; I'm wondering 
> what I should say when DELIMIDENT is not set -- undef is very 
> tempting, but is it correct?  (Actually, my life is really contorted; 
> if the Informix database server is started with DELIMIDENT set in its 
> environment, it won't matter whether the client ever sets the 
> environment or not -- that server is running with delimited 
> identifiers enabled.  But there isn't an easy way to find out whether 
> this obscure circumstance applies other than by trying to use a 
> delimited identifier and seeing whether you get an error or not.)
> 
> Another related question: how does $dbh->tables('%','','') compare 
> with DBI->data_sources()?  Since this is explicitly called out as a 
> special case in $dbh->tables(), I presume it is different from what 
> DBI->data_sources() is expected to return - but how?  Or is it a 
> feature that actually should be omitted from the non-experimental 
> specification of $dbh->tables()?

See the DBI docs. data_sources is about DBI DSN's.

> Now you know why I'm balding!

Yeap. I don't envy you task. You're life will be easier if you get
the latest Informix ODBC driver and the latest DBD::ODBC (development
release I think) and see how they behave. The implementors of
Informix's ODBC driver must have faced many of these issues and
addressed them in an officially acepted way.

Tim.

Reply via email to