Re: [SQLObject] SQLObject mass insertion (was: Speed comparison)
(Sorry, little mistake on the To: field and this didn't go to the list. My apologies for the one who received this :P) Hi everybody again, sorry for the delay in answering. I've been doing some more testing, with both ORMs (SQLO and SQLA) in debug mode, and seeing the SQL that gets sent to the db. In the meantime I also added somewhat of a counter for my app, which counts in 10ms multiples, the time that it takes to scan the directory. Pretty rough but it serves its purpose. So I started comparing both ORMs on different directories. An since SQLA almost halved the time it took SQLO to scan a directory full of big (1920x1200) wallpapers, I decided to debug it by scanning a directory with only one image. First of all, the structure of the class that maps the filesystem images onto my app, looks something like this: class Photo(File): hasthumb = BoolCol(default = None) author = StringCol(default = None) res = StringCol(default = None) date_taken = StringCol(default = None) soft = StringCol(default = None) _thumb = PickleCol(default = None) _icon = PickleCol(default = None) def get_thumb(self): return gtk.gdk.pixbuf_new_from_array(self._thumb, gtk.gdk.COLORSPACE_RGB, 8) def set_thumb(self, value): if isinstance(value, gtk.gdk.Pixbuf): self._thumb = value.get_pixels_array() def get_icon(self): return gtk.gdk.pixbuf_new_from_array(self._icon, gtk.gdk.COLORSPACE_RGB, 8) def set_icon(self, value): if isinstance(value, gtk.gdk.Pixbuf): self._icon = value.get_pixels_array() thumb = property(get_thumb, set_thumb) icon = property(get_icon, set_icon) Those properties are there because in order to pickle an image and be able to successfully unpickle it, you have to convert it to its pixels' array. Otherwise, you can't unpickle it. Here is the code that loads both thumb and icon into the object: photo.icon = \ gtk.gdk.pixbuf_new_from_file_at_size(photo.strabs, SETTINGS.thumblistsize, SETTINGS.thumblistsize) photo.thumb = \ gtk.gdk.pixbuf_new_from_file_at_size(photo.strabs, SETTINGS.thumbpanesize, SETTINGS.thumbpanesize) photo.hasthumb = True So, in my app, the thumb and icon for the Photo class are only loaded once, on the original scan. Then, by looking at SQLO's output I found out this: http://pastebin.com/raw.php?i=Xp1TjmZ0 Basically, if you look in the previous link for Setting icon or Setting thumb you will see that both the icon and the thumb are being set TWICE! with the same data. Now I honestly don't know where this comes from. SQLA apparently doesn't do this (hence, its better scanning time), and indeed it is an odd thing to do. I debugged a little more and found that indeed the properties for both icon and thumb are being called twice, though I don't know why. Is there maybe something in the code of the PickleCol that triggers this behaviour? Thanks for your help and time :) Cheers Juan Manuel Santos From: Oleg Broytman p...@phd.pp.ru To: sqlobject-discuss@lists.sourceforge.net Date: Tuesday 23 March 2010 On Tue, Mar 23, 2010 at 12:57:19AM -0300, Juan Manuel Santos wrote: Are there any ideas on why would SQLO be a bit slower When you declare a table class MyTable(SQLObject): ...columns... and do an INSERT by calling MyTable(columns) SQLObject immediately does a SELECT to get back autogenerated fields (timestamps and the like) This is slow. It's ok to create rows like this occasionally but it is certainly bad for mass-insertion. For mass-insertion use SQLBuilder. Alas, it's underdocumented. Go by example: record = {'column1': value1, 'column2': value2, ...} connection.query(connection.sqlrepr( sqlbuilder.Insert(MyTable.sqlmeta.table, values=record))) These are simple straightforward INSERTs without any additional high-level burden - no SELECT, no caching, nothing. Fire and forget. It is not as high-level as calling MyTable() but it is still high enough - sqlrepr() does value conversion and quoting, e.g. Oleg. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss
Re: [SQLObject] SQLObject mass insertion
On Thu, Apr 01, 2010 at 05:47:06AM -0300, Juan Manuel Santos wrote: http://pastebin.com/raw.php?i=Xp1TjmZ0 Basically, if you look in the previous link for Setting icon or Setting thumb you will see that both the icon and the thumb are being set TWICE! with the same data. There are two debugging entries for every query (see CREATE TABLE for examples) but every query runs only once. Two debugging output are from different methods. Oleg. -- Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru Programmers don't die, they just GOSUB without RETURN. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss
Re: [SQLObject] SQLObject mass insertion
Yes, you're right and that's a bit odd. I've set the debug variable in the usual way: self._conn = connectionForURI(con_str) self._conn.debug = True And I noticed this also: 1/Query : CREATE TABLE meta_dir ( [...] 1/QueryR : CREATE TABLE meta_dir ( [...] Any idea why debug gets printed twice, but the second time with a capital 'r' after Query? :P From: Oleg Broytman p...@phd.pp.ru To: sqlobject-discuss@lists.sourceforge.net Date: Thursday 01 April 2010 On Thu, Apr 01, 2010 at 05:47:06AM -0300, Juan Manuel Santos wrote: http://pastebin.com/raw.php?i=Xp1TjmZ0 Basically, if you look in the previous link for Setting icon or Setting thumb you will see that both the icon and the thumb are being set TWICE! with the same data. There are two debugging entries for every query (see CREATE TABLE for examples) but every query runs only once. Two debugging output are from different methods. Oleg. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss
Re: [SQLObject] SQLObject mass insertion
On Thu, Apr 01, 2010 at 06:18:35AM -0300, Juan Manuel Santos wrote: 1/Query : CREATE TABLE meta_dir ( [...] 1/QueryR : CREATE TABLE meta_dir ( [...] Any idea why debug gets printed twice, but the second time with a capital 'r' after Query? :P Two debugging output are from different methods. $ grep -F printDebug sqlobject/dbconnection.py def _executeRetry(self, conn, cursor, query): if self.debug: self.printDebug(conn, query, 'QueryR') return cursor.execute(query) def _query(self, conn, s): if self.debug: self.printDebug(conn, s, 'Query') self._executeRetry(conn, conn.cursor(), s) def _queryAll(self, conn, s): if self.debug: self.printDebug(conn, s, 'QueryAll') [skip] if self.debugOutput: self.printDebug(conn, value, 'QueryAll', 'result') ._query(), ._executeRetry(), etc can be called from different methods in a different order so these extra debugging allows to trace the call stack. Oleg. -- Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru Programmers don't die, they just GOSUB without RETURN. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss
Re: [SQLObject] SQLObject mass insertion
Got it, thanks One last thing and maybe a bit offtopic: how do I set PRAGMA asyncrhonous = OFF for the SQLite connection? I'm trying to see if the speed difference is due to this (as I suspect that SQLA uses PRAGMA), but this: self._conn.queryAll(PRAGMA synchronous=OFF;) or this: self._conn.query(PRAGMA synchronous=OFF;) Doesn't change anything. Any ideas? Thanks again, Juan Manuel From: Oleg Broytman p...@phd.pp.ru To: sqlobject-discuss@lists.sourceforge.net Date: Thursday 01 April 2010 On Thu, Apr 01, 2010 at 06:18:35AM -0300, Juan Manuel Santos wrote: 1/Query : CREATE TABLE meta_dir ( [...] 1/QueryR : CREATE TABLE meta_dir ( [...] Any idea why debug gets printed twice, but the second time with a capital 'r' after Query? :P Two debugging output are from different methods. $ grep -F printDebug sqlobject/dbconnection.py def _executeRetry(self, conn, cursor, query): if self.debug: self.printDebug(conn, query, 'QueryR') return cursor.execute(query) def _query(self, conn, s): if self.debug: self.printDebug(conn, s, 'Query') self._executeRetry(conn, conn.cursor(), s) def _queryAll(self, conn, s): if self.debug: self.printDebug(conn, s, 'QueryAll') [skip] if self.debugOutput: self.printDebug(conn, value, 'QueryAll', 'result') ._query(), ._executeRetry(), etc can be called from different methods in a different order so these extra debugging allows to trace the call stack. Oleg. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss
[SQLObject] PRAGMA (SQLite)
One last thing and maybe a bit offtopic: Offtopic for that thread, so I started a new thread on this new topic. how do I set PRAGMA asyncrhonous = OFF for the SQLite connection? I'm trying to see if the speed difference is due to this (as I suspect that SQLA uses PRAGMA), but this: self._conn.queryAll(PRAGMA synchronous=OFF;) or this: self._conn.query(PRAGMA synchronous=OFF;) Doesn't change anything. Any ideas? I'd recommend you first to try this from the command line. Does it really matter for a program? If there is a difference between these modes then I can guess the problem with SQLObject not setting the asynchronous mode is that it sets it on a different connection. _connection is not a DB API connection but an instance of SQLObject's DBConnection that handles a lot of low-level connections itself. Usually .query() opens a connection, executes the query and closes the connection; or it maintains a pool of open connections and uses a random one from the pool. So two different queries (PRAGMA and INSERT) could be run via two different low-level connections. The simplest way to narrow the range of low-level connections is to open a transaction and run all queries through the transaction. A transaction, obviously, uses one DB API connection. The more complex way would be to override SQLite's connection factory, but that's a long and ugly story, though it is possible and sometimes required in SQLObject. Oleg. -- Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru Programmers don't die, they just GOSUB without RETURN. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss
Re: [SQLObject] Speed comparison
I believe I fixed the speed issue. I wrapped the most expensive functions/calls to SQLObject (expensive in terms of I/O) in a transaction. I didn't realize that that's the way SQLAlchemy works, and maybe that's why it was being faster (I had some functions which modified several attributes one at a time, which resulted in several UPDATE statements). Anyway in the end, and for the record, when wrapping the most expensive calls in a transaction (so they get executed all at once), there is little to no speed difference between SQLO and SQLA (and even SQLO turns out to be the faster when there's a difference). Nice tip to keep in mind :) Thanks everybody for their help, it was truly priceless! Cheers Juan Manuel Santos -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss
Re: [SQLObject] Speed comparison
Some examples of the code, so we all can learn from your experiences, will be nice. Regards Petr On 1 April 2010 21:30, Juan Manuel Santos vicariou...@gmail.com wrote: I believe I fixed the speed issue. I wrapped the most expensive functions/calls to SQLObject (expensive in terms of I/O) in a transaction. I didn't realize that that's the way SQLAlchemy works, and maybe that's why it was being faster (I had some functions which modified several attributes one at a time, which resulted in several UPDATE statements). Anyway in the end, and for the record, when wrapping the most expensive calls in a transaction (so they get executed all at once), there is little to no speed difference between SQLO and SQLA (and even SQLO turns out to be the faster when there's a difference). Nice tip to keep in mind :) Thanks everybody for their help, it was truly priceless! Cheers Juan Manuel Santos -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss
Re: [SQLObject] Speed comparison
(grr it happened again ., my apologies to the guy who received this mail, it was intended for the list) From: Petr Jakeš petr.ja...@tpc.cz To: vicariou...@gmail.com Date: Thursday 01 April 2010 Some examples of the code, so we all can learn from your experiences, will be nice. Regards Petr Well, for this particular case, I was using a factory class that was in charge of creating the several objects that were required. It worked more or less like so: there is an indexer class, which takes care of crawling the filesystem. For every file or directory it encounters, it passes a series of parameters to the factory class (or object ;)), which the factory uses to instantiate the object that represents said file/dir. Yes, you may say that has too much coupling, but given the handling that one must do regarding connections and transactions (and taking care not to use them in a different thread than the one they were created on), it seemed the easiest and more or less right way to do it. So, let's say that there's one method in this Factory class/object to instantiate each of the objects that can be scanned from the filesystem (regular file, directory, image/video/audio file). Instantiation in itself isn't too expensive (or at least there isn't much that you can do about it). But when it comes to extracting the metadata for some particular files (namely, audio, video and image files) I found that I was setting fields on the object one at a time. This caused several UPDATE statements, which slowed everything down. So, in order to alleviate this, I did the following: def new_photo(self, parent, name, relpath, mimetype, atime, mtime, root, size, strsize, strabs): trans = self._conn.transaction() atime, mtime = self._check_stat(atime, mtime) mimetype = mimetype[0] photo = Photo(parent = parent, name = name, relpath = relpath, mimetype = mimetype, atime = atime, mtime = mtime, size = size, strsize = strsize, root = root, isdir = False, strabs = strabs, connection = trans) self._get_photo_metadata(photo) trans.commit(close = True) return photo The photo (I didn't use Image as a name so I wouldn't overwrite some other class already defined within Python) was the heaviest class in this regard, since it extracted two scaled images from the original (that's just something the pogram does, you can try it out if you want to). This extraction is done in the _get_photo_metadata method. So I got a transaction from the regular SQLObject connection, and used it as a parameter at object creation time, instead of the connection. After that I commit it with the close parameter on True, since then the method ends and I don't want that transaction lying around so that after working I still have a journal file for the SQLite db :P. Hope it is useful for somebody in the future. Don't forget that the transaction object you get from the connection.transaction() method, is already begun. You just have to work directly with it and DON'T FORGET to close it :) (via commit or rollback). Thanks for the help people. Cheers Juan Manuel Santos -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ sqlobject-discuss mailing list sqlobject-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss