Re: [DB-SIG] Controlling return types for DB APIs

Art Protin Thu, 03 May 2007 12:06:29 -0700

Dear folks,
   I just spent the morning reviewing the dialog on this thread without
completely abandoning thoughts on my other thread, "Fetch_raw".
On that other thread, inspite of this fragment of the conversation
(Carsten Haese responding to me):

1) any of you could argue that it is a corruption of the API that will
lead to bad results or bad practices to include a mechanism for accessing
the "raw" data;  OR


I'm not going to argue that. Allowing access to "raw" data may be
beneficial for performance and meaningful to the developer.

I found that supportive feedback made me realize that my approach was

just another special case of this more general issue, and should bediscarded.


I failed to find a solution yet to the need expressed in this thread but
believe that I have found a dozen key issues that define the problem
(although when I enumerate them below, I may combine or subdivide
them slightly differently).  Further, remember as you read this that this
is my view and I tend to overstate my case.

First, I believe that nothing returned from a call to cursor.fetchXXX()
should be of a type bound to the API/interface.  I recognize that this is
a difficult constraint but the values need to usable by some application
written in Python and may even need to written to a different DBMS.
I see the role of the interface to make the data available in pure Python
form. (So this argues against the proposal by Carsten Haese, I think.)

This view says then that the types used by SQL and various DBMS are
"foreign types" (foreign to Python).  (I think tthat this generalizes to a
claim that the view of the API is from Python looking out - queries go out
and results come back.)

There is an issue with representing the identities of foreign types inPython.

It may not prove to be difficult in practice but I think we need to be aware
of the issue to keep from making a big problem out of it.  I use strings to
represent (the names/identifiers of) both SQL and native database types.

There is a further issue with types in that SQL data types do not map
fully to either Python type nor to the native types of some of DBMSs.
My driver+DBMS allows SQL queries to reference the native types and
all the results are expressed in native type.  Default conversions are done
based not on the SQL types reported in cursor.description but are based
on the native types reported in the metadata of the result.

There seems to be a lot of confusion (that I especially see in our JDBC
driver and its applications) about the difference between the tables (and
columns) in the database and the results from a query.  My DBMS creates
in effect an anonymous (& temporary) table with the results of a query and

the types of the columns in the result are not necessarily the exactsame types

as the column in db that went into the result.  Yet I find some applications
that use JDBC use the type gotten from the description of the original table
and not the type given with the result.  Some of the comments in this thread
seem to reflect this confusion although I may be mistaken.

While my interface uses strings to represent the types, those stringsare formattedto express structure, specifically a major type and a minor or subtype. I doubt

that my DBMS is unique with regard to having a hierarchy of type information

although it may be that none yet go beyond two levels. It may even bethe casethat someone has purely othogonal scales that define their type space.We needto consider how to encode the taxonomy of types with enough flexibilityto handlemost if not all players. I suspect that all the various type taxonomiescan be

accomodated with merely two levels by defining an abstract main type, with

a method to derive it from the full type, and the full type. A mappingscheme

would operate at the abstracted main type and the conversion routines could
branch out as needed.  The selection of the abstracted main types would be

an implemeters trade-off of grouping common functionality verses thebushiness

of special handling

The role of my sub types in my collection of conversion routines varieswidely,from their specifying a trivial parameter, such as the maximum length ofa string,to their specifying formats and reference points, as the subtypes ofdate do.

My DBMS even has types that possible should be the same major type but

aren't. Thus, in general, I find that, rather than code up thousands ofsimilar routines,

I have a handfull of conversion routines that need to know the full type

representation (ie both major and sub types) in order to do the properconversion.I suspect that this should be a common feature, that the conversionroutines putinto a "mapping" structure always have a "signature" (argument list)that accepts

the relevent type information as well as the data to be converted.

With a good enough scheme for type groups and full types (what I was calling

abstracted main types and full types) the mapping for conversions to andfromforeign types ought to become fairly simple. A device not unlike adictionary

would associate a type group with two routines, one to export a python value
to the DBMS (for query parameters or inserts or updates) and one to import
a value from the DBMS.  The import routines would take a raw value and (the

representation of) a full (foreign) type and convert the value into apure Python valueof the appropriate type (it should "know" what type that is). Theexport routineshould take a Python value and (the representation of) a full foreigntype andagain do the right thing. The routine can be written to ignore theforeign type,relying on the method that "abstracted" the type into a type grouping tosufficientlydistinguish the types and fully select the conversion, or at the otherextreme could

handle every type and subtype and subsubtype in the DBMS such that there is
only the one pair of routines in the mapping structure and one type group.

There are reasons why this mapping structure should be at the modulelevel andinherited by the connection and cursor in turn. Disturbingly, I evensee worth inseparately tying it to either the statement or the results. (I doimplement a queryobject in my driver that corresponds to the SQL statement and could thusseparatethem.) I even see the worth in tying it to specific columns asexemplified by thejpeg and pickle case. Given the current structure of the API, I thinkthat the settingon the cursor should stay until reset. If you know enough about thingsto set it up

you should know when to reset it.

I do not remember who commented on the distastefulness of messing with the

SQL queries that go through the interface, but I feel I need to commenton itas well. As ugly as it such tampering is, my driver has no recourse butto alterthe SQL. It is the only way that I can support SQL parameters. Thatsaid, I

strive to make the fewest, and smallest changes to the SQL.

Looking back on this much too long posting, I am struck by therealization that1) there seems to be very little need to extend the support for mappingSQL types

to Python types (there are so few of them and they don't change much);
2) the very hard part of supporting mapping from DBMS native types to Python
types in a generic manner is that DBMS native types aren't generic.

( and 3) I do not need to add public access to the raw data - peopledebugging

things will have to poke around under the hood or complain to me about it.)

I hope that this was worth what ever time you took reading it.

   Thank you all,
   Arthur Protin

_______________________________________________
DB-SIG maillist  -  DB-SIG@python.org
http://mail.python.org/mailman/listinfo/db-sig

Re: [DB-SIG] Controlling return types for DB APIs

Reply via email to