Here we are again, trying to create a better SQL support for IMDbPY. :-)

The situation:
- we're using SQLObject to abstract database accesses.
- the imdb.parser.sql package is more or less OK: I would like to split
  information retrieved by get_movie and get_person (these methods
  actually retrieve every kind of information) into separated methods,
  but this is quite easy.
- imdbpy2sql.py works, but it's really slow.

The problems:
#1 at insert time we use SQLObject only to abstract tables/cols names,
   while real insertions are done calling the 'executemany' method of
   the 'cursor' object.
#2 not every DBAPI2-compliant modules use the same style to handle
   parameters for the 'executemany' method, forcing us to convert 
   the list of parameters (if needed) before the call to executemany.
#3 SQLObject automatically add an "id" column to every table in the
   database, even if we don't need it for some tables (like MovieInfo).
#4 every time a Primary Key or a Foreign Key is used, SQLObject (correctly)
   adds a index in the given table, slowing down the insertion of huge
   amounts of rows.

Possible solutions/questions/thoughts about the above problems:
#1 and #2 SQLObject doesn't use or support executemany at all.
   I've done some tests even with SQLAlchemy: it was faster, but not
   as fast as it can be.  The explanation is simple: it creates a
   lot of intermediate/temporary objects to abstract the executemany
   method; they are certainly useful in a complex environment like
   the SQLAlchemy package, but totally oversized for our purposes.
   Possible solution: whatever the tool we'll use to abstract connection,
   we'll be probably forced to do parameters conversions by ourselves.
#3 it's clearly stated in the documentation that SQLObject will always
   insert an "id" primary key column.
   I can't see any solution, outside switching to another ORM.
   The fact is that PyDO seems not much developed, and SQLAlchemy is...
   well, somewhat _too_ developed. :-)
   The 0.2 (still beta) looks promising, but I can't tell how much it
   will be stable at the API level.
   Anyway: at a very first look, SQLAlchemy appears to be more "layered"
   than SQLObject: the user is not forced to use it like a ORM, and
   it can be used just to abstract a db connection.
   Obviously I can be wrong, and it can be the wrong tool to use.
#4 It seems that you can't tell SQLObject to _not_ create such indexes,
   and - anyway - there is for sure no way to create the indexes later,
   after every row was inserted (this would help al lot, improving
   performances).
   I don't know if SQLAlchemy is any better, on this aspect.

Right now I'm reading SQLAlchemy 0.2 documentation [1], trying to
understand if it can be a useful tool.
I'm also wondering if a threaded approach can be convenient, but
I fear it will add too much complexity.


[1] http://www.sqlalchemy.org/docs_02/
-- 
Davide Alberani <[EMAIL PROTECTED]> [PGP KeyID: 0x465BFD47]
http://erlug.linux.it/~da/


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to