Re: [sqlalchemy] Can python objects be used in join logic?

Jennifer Rodriguez-Mueller Thu, 28 Apr 2011 16:13:19 -0700

Thank you Michael, that makes things *much* clearer!

I had been thinking of sqlalchemy as a database manipulation system that
took inspiration from my source code, rather than thinking of a library that
uses compiled but introspectively accessible bytecode with variable names
preserved, so that python's introspective awareness of attempted runtime
attribute access can be directed to drive the logic of database access.


If I understand correctly:

    1. I saw no errors because sqlalchemy never even saw the code that I
thought I was passing it (which python fully evaluated before sqlalchemy
ever touched it).

    2. This is why functions like sqlalchemy.and_(), sqlalchemy.or_(), and
sqlalchemy.not_() are necessary... they are fundamentally for delaying
evaluation of statements until everything can be evaluated at the same time.

    3. This is also why there are no *python* methods or attributes
corresponding to columns when you try to inspect a table object with
dir(selectable_table_instance).  If there were python attributes for columns
that dir() could see, then references to those column/attributes would be
resolved by python before sqlalchemy could do anything clever.

In any case, I re-wrote my UNION logic with or_()'s connecting subclauses
made of and_()'s and everything worked properly as far as join logic goes.
I'll probably come back to this email in the future to make sure I remember
all the rest of your advice about str(), union(), and so on.

Also, for anyone dealing with a similar task in the future, I had a related
problem with SqlSoup table instances that had identical column names.  I was
seeing warnings like this:

/Library/Python/2.6/site-packages/SQLAlchemy-0.6.7-py2.6.egg/sqlalchemy/orm/mapper.py:585:
SAWarning: Implicitly combining column aliquot_table.id with column
specimen_table.id under attribute 'id'.  This usage will be prohibited in
0.7.  Please configure one or more attributes for these same-named columns
explicitly.

To make the warnings go away I ended up spelling out how to resolve column
ambiguities (possibly a violation of DRY?) by repeating the content of one
of the tables I was working with, with slight variations in the names.  The
final working code looked like this:

def getSpecimenAliquots(dbStr):
    from sqlalchemy.ext.sqlsoup import SqlSoup
    from sqlalchemy import select, and_, or_
    soupDB = SqlSoup(dbStr)
    aliq = soupDB.aliquot_table
    spec = soupDB.specimen_table
    spec = select([
        # Fix labels that clash with aliquot column names...
             spec.c.id.label("s_id")
            ,spec.c.random_id.label("s_random_id")
        # Things to pass through with their own names...
            ,spec.c.study_code
            ,spec.c.patient_id
            ,spec.c.s_type
            ]).correlate(None).alias()
    join1 = soupDB.join(aliq,spec,
            and_(aliq.random_id == spec.c.s_random_id,
            or_(and_(aliq.a_type ==   3, spec.c.s_type ==  12), #Serum/Serum
            or_(and_(aliq.a_type ==  15, spec.c.s_type ==   7), #Gut/Gut
            or_(and_(aliq.a_type ==   7, spec.c.s_type ==   5),
#Saliva/Saliva
               (and_(aliq.a_type ==  23, spec.c.s_type ==  13)
#NO_CODE/Swab
               ))))),
            isouter=False)
    join1 = join1.filter_by(test_id=12345)
    return join1

Hopefully that helps someone in the future who is dealing with SqlSoup for
the first time and finds this email.  Thank you again for the help, Michael
:-)

-Jennifer

On Thu, Apr 28, 2011 at 1:15 PM, Michael Bayer <mike...@zzzcomputing.com>wrote:

>
> On Apr 28, 2011, at 2:24 PM, Jennifer Rodriguez-Mueller wrote:
>
>
> Here is the core SQL (lots of distracting content trimmed):
>
> SELECT s.*, a.*
>   FROM aliquot_table a
>     JOIN ( -- Biologically (p)lausible aliquot/specimen type logic
>       SELECT  3 AS a_type, 12 AS s_type UNION  -- Serum/Serum
>       SELECT 15 AS a_type,  7 AS s_type UNION  -- Gut Biopsy/Gut Biopsy
>       SELECT  7 AS a_type,  5 AS s_type UNION  -- Saliva/Saliva
>       SELECT 23 AS a_type, 13 AS s_type        -- NO_CODE/Swab
>       ) p ON a.a_type = p.a_type
>     JOIN specimen_table s ON (s.random_id = a.random_id AND s.s_type =
> p.s_type)
>   WHERE a.test_id = 12345
>
> Here is the (assumed) equivalent python:
>
> def getSpecimenAliquots(fromDB):
>     from sqlalchemy.ext.sqlsoup import SqlSoup
>     plausibleTypes = set([( 3,12) #Serum/Serum
>                          ,(15, 7) #Gut Biopsy/Gut Biopsy
>                          ,( 7, 5) #Saliva/Saliva
>                          ,(23,13) #NO_CODE/Swab
>                          ])
>     soupDB = SqlSoup(fromDB)
>     aliq = soupDB.aliquot_table
>     spec = soupDB.specimen_table
>     join1 = soupDB.join(aliq, spec,
>         aliq.random_id == spec.random_id and (aliq.a_type, spec.s_type) in
> plausibleTypes,
>         isouter=False)
>     join1 = join1.filter_by(aliq.test_id=12345)
>     return join1.all()
>
> The code works in SQL, but when I try to translate it into sqlalchemy's
> SqlSoup the number of rows is different (specifically the plausibility logic
> is being silently ignored for some reason because there are 3 MappedJoin
> objects instead of 1 row from the SQL and the two of the MappedObjects have
> incompatible specimen/aliquot types).
>
> I'm wondering if I expected too much integration between python and SQL
> when I used a python set object to drive the join?  When I imagine what
> sqlalchemy "should ideally do" with something like this, it would be to
> understand how python types and comparisons over them correspond to database
> table/index idioms and (perhaps?) create a temporary table in the DB with
> optimized indexes based on the content of the set, write logically
> equivalent SQL to use that temp table, get the results, and then clean up
> after itself.   This would be much better than my SQL, which is logically
> sound but not efficiently implemented, because (after all) it is only
> exploratory SQL, not a real programming language.
>
>
> Well there's a leap here, which is that the original SQL uses UNION.   The
> "assumed" Python doesn't make any mention of unions.    There are also
> several elements here of Python that go above what the language can offer us
> in terms of operator overloading (that is, SQLAlchemy overloads things like
> ==, !=, but can only do this so much) - the Python "and" operator can't be
> intercepted (use and_(), or alternatively &&), the "(x, y) in set()" can't
> be intercepted ('tuple' and 'set' are both Python builtins that aren't
> modified).
>
> As a rule, when building things up you can always test individual
> expressions by calling str() on them.    Such as, if you were to say
> str(aliq.a_type == 5) you'd get a SQL expression.   If you call it on
> "aliq.a_type == 5 and spec.s_type==7" you'll get a string like "true" or
> "false" since the "and" operator isn't overridden, and the boundary of
> SQLAlchemy expression generation is crossed into plain Python.    Turning on
> SQL logging is also a good habit when building up queries to see what is
> being emitted as well as what's being returned (the docs describe how to do
> this).
>
> The "(x, y) in set()" is not something that has a universally supported SQL
> representation.   Postgresql and MySQL for example could state an expression
> like that as "(x, y, z) IN ((1, 2, 3), (4, 5, 6), ...)", but SQLite and
> others cannot.   The approach you have uses UNIONs which is more verbose but
> probably better supported.   It doesn't seem like it would present a major
> inefficiency, if you wanted to note the difference between using temp tables
> and the inline expression you'd work with the databases EXPLAIN services,
> i.e. features that illustrate the query plan.
>
> The best approach when translating from literal SQL is to build up as much
> of that exact statement as possible (which is typically the whole thing).
>  Your table of unions can be achieved as:
>
> u = union(*[
>     select([
>         literal_column(str(tup[0])).label("a_type"),
>         literal_column(str(tup[1])).label("s_type")
>     ])
>         for tup in plausibleTypes
> ]).alias()
>
> 'u' is then a selectable that represents the thing you have as the second
> element of your join - its columns are available as "u.c.<name>".      The
> join can then be constructed exactly as the SQL version is, substituting
> prefix conjunctions instead of infix (i.e. "and_(x, y)" instead of "x and
> y"):
>
> soupDB.join(aliq, u, u.c.a_type==aliq.a_type).join(spec,
> and_(spec.random_id==aliq.random_id, spec.s_type==u.c.s_type))
>
>
> How am I confused given that I'm surprised by the different number of
> rows/objects?
>
>
> well you should do the SQL logging to see the whole conversation as it
> happens
>
> Why was no error thrown?
>
>
> you didn't actually make an "error", you essentially used Python to
> comprise an expression "aliq.random_id == True" - no errors there other than
> possible type conflicts.
>
> Also, assuming I've done something amazingly wrong, what is the *right* way
> to specify join logic of this sort?  Perhaps I could translate the specimen
> type logic into a giant python expression with lots of ((foo=3 and bar=12)
> or (foo=15 and bar=7) or...)?  But that seems unlikely to be idiomatic with
> sqlalchemy...
>
>
> well there is not an idiomatic system for (x, y) in set() right now, so the
> union() approach above is one way, or if you wanted to use SQL tuples there
> is a tuple_() construct - you'd need to wrap two levels of tuple_() on the
> right hand side, the "IN" is accomplished by using the "in_()" operator:
>
> from sqlalchemy import tuple_, literal_column
> from sqlalchemy.sql import column
>
> u = tuple_(column('x'), column('y')).in_(
>     tuple_(*[tuple_(*[literal_column(str(v)) for v in tup])
>     for tup in plausibleTypes])
> )
> print str(u)
>
> returns:
>
> (x, y) IN ((7, 5), (3, 12), (15, 7), (23, 13))
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Can python objects be used in join logic?

Reply via email to