Re: [DB-SIG] Controlling return types, again

M.-A. Lemburg Mon, 21 May 2007 10:14:02 -0700

Hmm, a bit hard to read, your reply. I've added some more
quote chars to make it easier...


On 2007-05-19 19:55, Carsten Haese wrote:
> 
>> quote="""
>> In general, I think that we shouldn't expose/impose such a
>> low-level interface for type mapping. These details should be
>> left to the module author and the user shouldn't have bother
>> with them.
>> """
> 
> For the common cases, the user won't have to bother with the low-level
> details. The module author will provide standard maps for the common use
> cases, and they're free to provide a library of nonstandard maps for
> "uncommon" use cases specific to their particular database, too.
> 
> In my opinion, making the low-level details available is the only thing
> that *guarantees* that the application developer can use this mapping
> facility for *any* use case they can think of. If we try to hide the
> low-level details, we might take away a crucial feature the application
> developer needs to get their job done.

The DB-API cannot per-se address low-level details. These are way
to specific to interface requirements imposed by the database
backend.

That's the reason why we have type code objects that allow doing
many-to-one equal comparisons.

Unfortunately, these don't work well with dictionaries since
an object can only have one hash value.

>> quote="""
>> Instead, we should try to find an API with simple methods like
>> e.g. .setinputconverter(), .setoutputconverter() to let the user
>> define new mappings.
>> 
>> The module can then use an implementation like the one you
>> described internally.
>> """
> 
> And how do you propose those simple methods are actually invoked so that
> they cover all common use cases in a database independent way without
> making them unusable for database-specific features?

See below: you pass in a single function which takes care of this.

>> quote="""
>> What I like about the proposal is that it doesn't try to
>> overgeneralize things (a very common habit in Python design).
>> """
> 
> Interesting observation considering that I have tried to find the most
> general solution to the problem at hand. That would mean that my
> proposal is exactly as general as it needs to be ;)

Well, let's put it this way: you could have started defining
a new class structure, using abstract classes, a type registry,
special object methods, various introspection APIs, etc.

Luckily, you avoided all that :-)

>> quote="""
>> A few comments on the bullets in your proposal and a sketch
>> of a slightly modified solution:
>> 
>>     *  Connection and cursor objects will have an attribute called outputmap 
>> that maps objects from the database to the
>> application and an attribute called inputmap that maps objects from the 
>> application to the database. Note that both
>> mappings are between Python objects. The Python objects on the database side 
>> are mapped from and to actual database
>> values by the canonical mapping.
>> 
>> See above. I think we should not expose these low level mapping
>> tables to the user.
>> 
>> Note that the database will typically have different ways of
>> defining the "column type code". Since we already expose this
>> type code in the cursor.description field, we should probably
>> use that as basis.
>> 
>> In any case, the type codes will be database specific. It won't
>> be possible to generically say: map integers to Python floats,
>> since "integers" may refer to a whole set of database types for
>> some backends or only to one type for others.
>> """
> 
> Right. Hence my proposal to use dictionary-like objects to perform the
> adapter function lookup.
> 
>> quote="""
>>     * The default mappings are None for efficiency, which means that only 
>> the canonical mapping is in effect. The same
>> can be achieved by an empty dictionary, but it's faster to check for None 
>> that to check for an empty dictionary.
>> 
>> That's implementation detail and should not be exposed.
>> """
> 
> Well, it's an implementation *hint*. 
> 
>> quote="""
>> We could add .getinputconverter() and .getoutputconverter()
>> to query the currently active mappings in some way.
>> """
> 
> True, but unfortunately, "in some way" makes this suggestion uselessly
> vague.

It was only a sketch. I think I'll write up a formal definition of the
idea and post it here.

>> quote="""
>>     * When a value is fetched from the database, if the value is not None, 
>> its column type (as it would be indicated in
>> cursor.description) is looked up in outputmap, and the resulting callable 
>> object is called upon to convert the fetched
>> value, as illustrated by this pseudo-code:
>> 
>>           converter = cursor.outputmap.get(dbtype, None)
>>           if converter is not None:
>>              fetched_value = converter(fetched_value)
>> 
>> There's a problem here: since fetching the database value from
>> the database will usually involve some more or less complicated
>> C binding code, you can't just pass the fetched_value to a
>> converter since this would mean that you already have a Python
>> object representing the value.
>> """
> 
> Right, and in this step, I do. This step happens after the
> database-dependent canonical mapping, which is informally defined as
> "Whatever the respective API module currently does."

Ah, but that's not necessarily what you need to convert the value
into a different type or format.

E.g. say you have a decimal column and the canonical method of
retrieving the value by using floats. Now say you want to return
these as decimals. Floats don't give you enough information to
properly do this.

Another example: say your database provides way to fetching
BLOBs in chunks. The database module will likely retrieve
the data in chunks, but still return the string in one piece
as canoncial representation. What you'd really want is an
iterator with which you could retieve the data in chunks as
well.

>> quote="""
>> Normally, a database interface will have a set of different
>> techniques by which a value is fetched from the database, e.g.
>> fetch a number value as integer, long, string, float, decimal.
>> 
>> To make full use of converters, we'll have to be able to tell
>> the database module: fetch this number type as e.g. decimal
>> using the internal fetch mechanisms (phase 1) and then call
>> this converter on the resulting value (phase 2).
>> 
>> Hope I'm clear enough on this. If not, please let me know.
>> """
> 
> Yes, you're perfectly clear, and my proposal already addresses this.
> What you're calling phase 1 is what I call the canonical mapping, and I
> am completely open to allowing database-dependent mechanisms for
> "guiding" or "tweaking" the behavior of this phase 1 mapping. I am even
> suggesting a way involving custom attributes on the adapter function.
> 
>> quote = """
>>     * The mappings need not be actual mappings. They may be any object that 
>> implements __getitem__ and copy. This allows
>> specifying "batch" mappings that map many different types with the same 
>> callable object in a convenient fashion.
>> 
>> That again is an implementation detail. We should factor such
>> a type collection feature into the above methods.
>> """
> 
> I won't stop you from trying. Please feel free to suggest a concrete
> mechanism.

Will do :-)

>> quote="""
>> My favorite would be to not set the converters per type and then
>> have a mapping, but to instead just have one function for
>> implementing phase 1 which then returns a converter function
>> to the database module to implement phase 2, e.g.
>> 
>> def mydecimalformat(value):
>>     return '%5.2f' % value
>> 
>> def outputconverter(cursor, position):
>>     dbtype = cursor.description[position][1]
>>     if dbtype == SQL.DECIMAL:
>>         # Fetch decimals as floats and call mydecimalformat on these
>>         return (SQL.DECIMAL, mydecimalformat)
>> """
> 
> Maybe I misunderstand, but doesn't this force every application
> developer to reinvent the wheel? How would they influence the mappings
> of two different types such as DECIMAL and CHAR except by writing one
> output converter for each possible combination they need?

If they need to modify the mappings for DECIMAL and CHAR,
then they'd put those two in the converter:

def mycharconverter(value):
    return unicode(value, 'utf-8')

def outputconverter(cursor, position):
    dbtype = cursor.description[position][1]
    if dbtype == SQL.DECIMAL:
        # Fetch decimals as floats and call mydecimalformat on these
        return (SQL.DECIMAL, mydecimalformat)
    elif dbtype == SQL.CHAR:
        # Fetch chars as Unicode objects
        return (SQL.CHAR, mycharconverter)

The advantage is that you can also do more complicated mappings,
e.g. by position of the column in a query:

def outputconverter(cursor, position):
    dbtype = cursor.description[position][1]
    if dbtype == SQL.DECIMAL and position == 2
        # Fetch decimals as floats and call mydecimalformat on these,
        # but only on column 3 in the result set
        return (SQL.DECIMAL, mydecimalformat)

or use a dictionary mapping:

mytypemap = {
    SQL.DECIMAL: (SQL.DECIMAL, mydecimalformat),
}

def outputconverter(cursor, position):
    dbtype = cursor.description[position][1]
    fetchastype, formatter = mytypemap.get(dbtype, None)
    if converter is None:
        # Use the default mapping
        return None
    return fetchastype, formatter

It's also possible to chain converters:

existing_converter = cursor.getoutputconverter()

def outputconverter(cursor, position):
    dbtype = cursor.description[position][1]
    fetchastype, converter = mytypemap.get(dbtype, None)
    if converter is None:
        # Revert to existing_converter
        return exiting_converter(cursor, position)
    return fetchastype, converter

> Also, I don't see how this helps in getting to a set of
> database-independent solutions for common use cases.

Depends on the common use cases. Do you have some ?

>> quote="""
>>     * For convenience, the module.connect() and connection.cursor() methods 
>> should accept outputmap and inputmap keyword
>> arguments that allow the application to specify non-default mappings at 
>> connection/cursor creation time.
>> 
>> Not sure about this: the type mapping setup should be explicitly done
>> after a proper connect. It may have to rely on the connection already
>> being established.
>> """
> 
> That's a good point. I agree.
> 
>> quote="""
>>     * When input binding is performed and the cursor's inputmap is not None, 
>> a converter function is looked up in the
>> inputmap according to the following pseudo-code:
>> 
>>           for tp in type(in_param).__mro__:
>>             converter = cursor.inputmap.get(tp, None)
>>             if converter is not None: break
>>           if converter is not None:
>>             in_param = converter(in_param)
>> 
>> This will cause a serious performance hit since you have to do
>> this for every single value fetched from the database. Just think
>> of a result set 20 columns and 10000 rows.
>> 
>> You'd have to run through the above for-loop 200000 times, even
>> though it's really only needed once per column (since the types
>> won't change within the result set).
>> 
>> The above two-phase approach avoids this.
>> """
> 
> I think you misunderstand. The code you're quoting is for input binding,
> not for output binding.

Sorry, I meant the output binding. To some extent the above also
applies to input binding, but the situation is different there
since the database module cannot assume that all objects in
a parameter list passed to .executemany() are of the same type.

> True, it would have to be done for every value
> passed as a parameter, but most python objects that a database is likely
> to see will have a rather short MRO, and the pseudocode is just a
> suggestion. The cursor could memoize the results of the lookup in case
> the same query gets executed again with input parameters of the same
> types. (And of course, memoization could also be done in the lookup for
> output adapters.)

True.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 21 2007)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
_______________________________________________
DB-SIG maillist  -  DB-SIG@python.org
http://mail.python.org/mailman/listinfo/db-sig

Re: [DB-SIG] Controlling return types, again

Reply via email to