Here we are again, trying to create a better SQL support for IMDbPY. :-) The situation: - we're using SQLObject to abstract database accesses. - the imdb.parser.sql package is more or less OK: I would like to split information retrieved by get_movie and get_person (these methods actually retrieve every kind of information) into separated methods, but this is quite easy. - imdbpy2sql.py works, but it's really slow.
The problems: #1 at insert time we use SQLObject only to abstract tables/cols names, while real insertions are done calling the 'executemany' method of the 'cursor' object. #2 not every DBAPI2-compliant modules use the same style to handle parameters for the 'executemany' method, forcing us to convert the list of parameters (if needed) before the call to executemany. #3 SQLObject automatically add an "id" column to every table in the database, even if we don't need it for some tables (like MovieInfo). #4 every time a Primary Key or a Foreign Key is used, SQLObject (correctly) adds a index in the given table, slowing down the insertion of huge amounts of rows. Possible solutions/questions/thoughts about the above problems: #1 and #2 SQLObject doesn't use or support executemany at all. I've done some tests even with SQLAlchemy: it was faster, but not as fast as it can be. The explanation is simple: it creates a lot of intermediate/temporary objects to abstract the executemany method; they are certainly useful in a complex environment like the SQLAlchemy package, but totally oversized for our purposes. Possible solution: whatever the tool we'll use to abstract connection, we'll be probably forced to do parameters conversions by ourselves. #3 it's clearly stated in the documentation that SQLObject will always insert an "id" primary key column. I can't see any solution, outside switching to another ORM. The fact is that PyDO seems not much developed, and SQLAlchemy is... well, somewhat _too_ developed. :-) The 0.2 (still beta) looks promising, but I can't tell how much it will be stable at the API level. Anyway: at a very first look, SQLAlchemy appears to be more "layered" than SQLObject: the user is not forced to use it like a ORM, and it can be used just to abstract a db connection. Obviously I can be wrong, and it can be the wrong tool to use. #4 It seems that you can't tell SQLObject to _not_ create such indexes, and - anyway - there is for sure no way to create the indexes later, after every row was inserted (this would help al lot, improving performances). I don't know if SQLAlchemy is any better, on this aspect. Right now I'm reading SQLAlchemy 0.2 documentation [1], trying to understand if it can be a useful tool. I'm also wondering if a threaded approach can be convenient, but I fear it will add too much complexity. [1] http://www.sqlalchemy.org/docs_02/ -- Davide Alberani <[EMAIL PROTECTED]> [PGP KeyID: 0x465BFD47] http://erlug.linux.it/~da/ ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel