Hello Carsten, > The important discussion on controlling return types has gone cold > again, so I'd like to revive it. Revision 2 of my type mapping proposal > was met with deafening silence except for valuable input by Jim > Patterson, which I have incorporated into Revision 3. The result is > available for your perusal once again at > http://www.uniqsys.com/~carsten/typemap.html . > > I don't know whether the general silence indicates tacit agreement or if > people are too busy to respond or even just to read my proposal in the > first place. I'd appreciate some feedback to see how close we are to > reaching consensus, even if it's just a "show of hands" in the form of > +1/0/-1 responses.
I did read your proposal and was about to reply, but then got "side" tracked again by other things. In general, I think that we shouldn't expose/impose such a low-level interface for type mapping. These details should be left to the module author and the user shouldn't have bother with them. Instead, we should try to find an API with simple methods like e.g. .setinputconverter(), .setoutputconverter() to let the user define new mappings. The module can then use an implementation like the one you described internally. What I like about the proposal is that it doesn't try to overgeneralize things (a very common habit in Python design). A few comments on the bullets in your proposal and a sketch of a slightly modified solution: * Connection and cursor objects will have an attribute called outputmap that maps objects from the database to the application and an attribute called inputmap that maps objects from the application to the database. Note that both mappings are between Python objects. The Python objects on the database side are mapped from and to actual database values by the canonical mapping. See above. I think we should not expose these low level mapping tables to the user. Note that the database will typically have different ways of defining the "column type code". Since we already expose this type code in the cursor.description field, we should probably use that as basis. In any case, the type codes will be database specific. It won't be possible to generically say: map integers to Python floats, since "integers" may refer to a whole set of database types for some backends or only to one type for others. * The default mappings are None for efficiency, which means that only the canonical mapping is in effect. The same can be achieved by an empty dictionary, but it's faster to check for None that to check for an empty dictionary. That's implementation detail and should not be exposed. We could add .getinputconverter() and .getoutputconverter() to query the currently active mappings in some way. * When a cursor is created, it inherits shallow copies of the connection's mappings. This allows changing the type mapping per cursor without affecting the connection-wide default. +1 * When a value is fetched from the database, if the value is not None, its column type (as it would be indicated in cursor.description) is looked up in outputmap, and the resulting callable object is called upon to convert the fetched value, as illustrated by this pseudo-code: converter = cursor.outputmap.get(dbtype, None) if converter is not None: fetched_value = converter(fetched_value) There's a problem here: since fetching the database value from the database will usually involve some more or less complicated C binding code, you can't just pass the fetched_value to a converter since this would mean that you already have a Python object representing the value. Normally, a database interface will have a set of different techniques by which a value is fetched from the database, e.g. fetch a number value as integer, long, string, float, decimal. To make full use of converters, we'll have to be able to tell the database module: fetch this number type as e.g. decimal using the internal fetch mechanisms (phase 1) and then call this converter on the resulting value (phase 2). Hope I'm clear enough on this. If not, please let me know. * The mappings need not be actual mappings. They may be any object that implements __getitem__ and copy. This allows specifying "batch" mappings that map many different types with the same callable object in a convenient fashion. That again is an implementation detail. We should factor such a type collection feature into the above methods. My favorite would be to not set the converters per type and then have a mapping, but to instead just have one function for implementing phase 1 which then returns a converter function to the database module to implement phase 2, e.g. def mydecimalformat(value): return '%5.2f' % value def outputconverter(cursor, position): dbtype = cursor.description[position][1] if dbtype == SQL.DECIMAL: # Fetch decimals as floats and call mydecimalformat on these return (SQL.DECIMAL, mydecimalformat) mxODBC has a converter function for defining phase 1 output conversions and it works nicely. It doesn't have a phase 2 implementation. Instead, it provides several attributes for tuning the internal fetch mechanisms. * For convenience, the module.connect() and connection.cursor() methods should accept outputmap and inputmap keyword arguments that allow the application to specify non-default mappings at connection/cursor creation time. Not sure about this: the type mapping setup should be explicitly done after a proper connect. It may have to rely on the connection already being established. * In discussions on the db-sig mailing list, some concern was raised that naming the directions of conversion as input and output is ambiguous because input could mean into the database or into the application. However, PEP 249 already uses input and output in the naming of setinputsizes and setoutputsizes, and this proposal follows the same semantics. Right, let's use names similar to those. "input" is always the direction from Python to the database (database gets input) and "output" from the database to Python (get receives output). * When input binding is performed and the cursor's inputmap is not None, a converter function is looked up in the inputmap according to the following pseudo-code: for tp in type(in_param).__mro__: converter = cursor.inputmap.get(tp, None) if converter is not None: break if converter is not None: in_param = converter(in_param) This will cause a serious performance hit since you have to do this for every single value fetched from the database. Just think of a result set 20 columns and 10000 rows. You'd have to run through the above for-loop 200000 times, even though it's really only needed once per column (since the types won't change within the result set). The above two-phase approach avoids this. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 19 2007) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 _______________________________________________ DB-SIG maillist - DB-SIG@python.org http://mail.python.org/mailman/listinfo/db-sig