Hi all, in the previous days, Giuseppe and me have worked on the sql data access system, to make it db-independent, using SQLObject.
Unfortunately (after a lot of work was already done, sigh!) we are now facing a problem: performances (at insert-time only [1]). The fact is: we can completely rely on SQLObject to manage data insertion, but it doesn't use the executemany() method of the "cursor" object of the underling database's back end, making it _really slow_ for every database server (mysql, postgresql, sqlite, ...) This is a pity because, at least for some database engines (MySQL, for instance), the use of the executemany() method can speed up the imdbpy2sql script by a factor of 4 or 5. Another option is to extract the "cursor" object by the SQLObject's connection, and directly use its executemany() method. This would take advantage of db back ends with a fast executemany method, while other database will still be slow (this is inevitable, so we don't care). At this point another problem arises: the style of the parameters passed to the executemany() is somewhat free, and different modules use different conventions. :-/ I fear that now we have only three options: 1. ignore performances, use only SQLObject and ignore the fact that MySQL (and maybe others DBs) can run in just 2/3 hours [2] instead of 8/12 hours. 2. go on using the cursor.executemany() method as described above, trying to (more or less manually) support every different parameters style. 3. drop SQLObject, and find a different tool. I've looked at SQLAlchemy, and it has a different, more layered, architecture: it can be used to abstract just the connection and the cursor, guaranteeing that the underling executemany() method is called. At an higher level it can mimic almost everything SQLObject do (I think), but _if_ we choose to switch to SQLAlchemy, I'll probably prefer to stay at the lower possible level. Except the facts that we're using SQLAlchemy and that we don't need to handle executemany's parameters by ourself, options 2 and 3 are architecturally very similar (both are low-level, using cursor.executemany) To summarize, I'm asking your opinion (especially to Giuseppe, having he already written a lot of code). To _me_ the first one sounds really bad; if you have had asked me yesterday, I would have chosen the 2nd, but it seems prone to errors; today I've given a first glance to SQLAlchemy at it _seems_ suitable for the work [3], but I can be wrong: I never used it. Either case, some (or a lot of) work will be lost and the release of 2.5 will be delayed. That was a fault of me and I'm really sorry for this: either I've chosen the wrong too or I've not fully understood it before starting the new implementation. Given my knowledge of SQLAlchemy - and databases and ORMs in general - this situation can repeat itself, so I'm asking your opinions [4]. +++ [1] we have to insert ~30mln of row. And the db continuously grows... [2] just an estimate, it can be longer. [3] it uses a single parameter style, hiding the underling implementation. I don't know a thing about performances. [4] speak freely: I'll reserve to myself the right to kindly ignore your ideas. ;-) -- Davide Alberani <[EMAIL PROTECTED]> [PGP KeyID: 0x465BFD47] http://erlug.linux.it/~da/ ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel