Dear folks,
   I just spent the morning reviewing the dialog on this thread without
completely abandoning thoughts on my other thread, "Fetch_raw".
On that other thread, inspite of this fragment of the conversation
(Carsten Haese responding to me):

1) any of you could argue that it is a corruption of the API that will
lead to bad results or bad practices to include a mechanism for accessing
the "raw" data;  OR

I'm not going to argue that. Allowing access to "raw" data may be
beneficial for performance and meaningful to the developer.

I found that supportive feedback made me realize that my approach was
just another special case of this more general issue, and should be discarded.

I failed to find a solution yet to the need expressed in this thread but
believe that I have found a dozen key issues that define the problem
(although when I enumerate them below, I may combine or subdivide
them slightly differently).  Further, remember as you read this that this
is my view and I tend to overstate my case.

First, I believe that nothing returned from a call to cursor.fetchXXX()
should be of a type bound to the API/interface.  I recognize that this is
a difficult constraint but the values need to usable by some application
written in Python and may even need to written to a different DBMS.
I see the role of the interface to make the data available in pure Python
form. (So this argues against the proposal by Carsten Haese, I think.)

This view says then that the types used by SQL and various DBMS are
"foreign types" (foreign to Python).  (I think tthat this generalizes to a
claim that the view of the API is from Python looking out - queries go out
and results come back.)

There is an issue with representing the identities of foreign types in Python.
It may not prove to be difficult in practice but I think we need to be aware
of the issue to keep from making a big problem out of it.  I use strings to
represent (the names/identifiers of) both SQL and native database types.

There is a further issue with types in that SQL data types do not map
fully to either Python type nor to the native types of some of DBMSs.
My driver+DBMS allows SQL queries to reference the native types and
all the results are expressed in native type.  Default conversions are done
based not on the SQL types reported in cursor.description but are based
on the native types reported in the metadata of the result.

There seems to be a lot of confusion (that I especially see in our JDBC
driver and its applications) about the difference between the tables (and
columns) in the database and the results from a query.  My DBMS creates
in effect an anonymous (& temporary) table with the results of a query and
the types of the columns in the result are not necessarily the exact same types
as the column in db that went into the result.  Yet I find some applications
that use JDBC use the type gotten from the description of the original table
and not the type given with the result.  Some of the comments in this thread
seem to reflect this confusion although I may be mistaken.

While my interface uses strings to represent the types, those strings are formatted to express structure, specifically a major type and a minor or sub type. I doubt
that my DBMS is unique with regard to having a hierarchy of type information
although it may be that none yet go beyond two levels. It may even be the case that someone has purely othogonal scales that define their type space. We need to consider how to encode the taxonomy of types with enough flexibility to handle most if not all players. I suspect that all the various type taxonomies can be
accomodated with merely two levels by defining an abstract main type, with
a method to derive it from the full type, and the full type. A mapping scheme
would operate at the abstracted main type and the conversion routines could
branch out as needed.  The selection of the abstracted main types would be
an implemeters trade-off of grouping common functionality verses the bushiness
of special handling

The role of my sub types in my collection of conversion routines varies widely, from their specifying a trivial parameter, such as the maximum length of a string, to their specifying formats and reference points, as the subtypes of date do.
My DBMS even has types that possible should be the same major type but
aren't. Thus, in general, I find that, rather than code up thousands of similar routines,
I have a handfull of conversion routines that need to know the full type
representation (ie both major and sub types) in order to do the proper conversion. I suspect that this should be a common feature, that the conversion routines put into a "mapping" structure always have a "signature" (argument list) that accepts
the relevent type information as well as the data to be converted.

With a good enough scheme for type groups and full types (what I was calling
abstracted main types and full types) the mapping for conversions to and from foreign types ought to become fairly simple. A device not unlike a dictionary
would associate a type group with two routines, one to export a python value
to the DBMS (for query parameters or inserts or updates) and one to import
a value from the DBMS.  The import routines would take a raw value and (the
representation of) a full (foreign) type and convert the value into a pure Python value of the appropriate type (it should "know" what type that is). The export routine should take a Python value and (the representation of) a full foreign type and again do the right thing. The routine can be written to ignore the foreign type, relying on the method that "abstracted" the type into a type grouping to sufficiently distinguish the types and fully select the conversion, or at the other extreme could
handle every type and subtype and subsubtype in the DBMS such that there is
only the one pair of routines in the mapping structure and one type group.

There are reasons why this mapping structure should be at the module level and inherited by the connection and cursor in turn. Disturbingly, I even see worth in separately tying it to either the statement or the results. (I do implement a query object in my driver that corresponds to the SQL statement and could thus separate them.) I even see the worth in tying it to specific columns as exemplified by the jpeg and pickle case. Given the current structure of the API, I think that the setting on the cursor should stay until reset. If you know enough about things to set it up
you should know when to reset it.

I do not remember who commented on the distastefulness of messing with the
SQL queries that go through the interface, but I feel I need to comment on it as well. As ugly as it such tampering is, my driver has no recourse but to alter the SQL. It is the only way that I can support SQL parameters. That said, I
strive to make the fewest, and smallest changes to the SQL.

Looking back on this much too long posting, I am struck by the realization that 1) there seems to be very little need to extend the support for mapping SQL types
to Python types (there are so few of them and they don't change much);
2) the very hard part of supporting mapping from DBMS native types to Python
types in a generic manner is that DBMS native types aren't generic.
( and 3) I do not need to add public access to the raw data - people debugging
things will have to poke around under the hood or complain to me about it.)

I hope that this was worth what ever time you took reading it.

   Thank you all,
   Arthur Protin


_______________________________________________
DB-SIG maillist  -  DB-SIG@python.org
http://mail.python.org/mailman/listinfo/db-sig

Reply via email to