Re: [SQLObject] SQLObject mass insertion (was: Speed comparison)

2010-04-01 Thread Juan Manuel Santos
(Sorry, little mistake on the To: field and this didn't go to the list. My 
apologies for the one who received this :P)

Hi everybody again, sorry for the delay in answering.

I've been doing some more testing, with both ORMs (SQLO and SQLA) in debug 
mode, and seeing the SQL that gets sent to the db. In the meantime I also 
added somewhat of a counter for my app, which counts in 10ms multiples, the 
time that it takes to scan the directory. Pretty rough but it serves its 
purpose.

So I started comparing both ORMs on different directories. An since SQLA 
almost halved the time it took SQLO to scan a directory full of big 
(1920x1200) wallpapers, I decided to debug it by scanning a directory with 
only one image. First of all, the structure of the class that maps the 
filesystem images onto my app, looks something like this:

class Photo(File):

hasthumb = BoolCol(default = None)
author = StringCol(default = None)
res = StringCol(default = None)
date_taken = StringCol(default = None)
soft = StringCol(default = None)
_thumb = PickleCol(default = None)
_icon = PickleCol(default = None)

def get_thumb(self):
return gtk.gdk.pixbuf_new_from_array(self._thumb,
   gtk.gdk.COLORSPACE_RGB, 8)

def set_thumb(self, value):
if isinstance(value, gtk.gdk.Pixbuf):
self._thumb = value.get_pixels_array()

def get_icon(self):
return gtk.gdk.pixbuf_new_from_array(self._icon,
  gtk.gdk.COLORSPACE_RGB, 8)

def set_icon(self, value):
if isinstance(value, gtk.gdk.Pixbuf):
self._icon = value.get_pixels_array()

thumb = property(get_thumb, set_thumb)
icon = property(get_icon, set_icon)

Those properties are there because in order to pickle an image and be able to 
successfully unpickle it, you have to convert it to its pixels' array. 
Otherwise, you can't unpickle it. Here is the code that loads both thumb and 
icon into the object:

photo.icon = \
gtk.gdk.pixbuf_new_from_file_at_size(photo.strabs,
 SETTINGS.thumblistsize,
 SETTINGS.thumblistsize)
photo.thumb = \
gtk.gdk.pixbuf_new_from_file_at_size(photo.strabs,
 SETTINGS.thumbpanesize,
 SETTINGS.thumbpanesize)
photo.hasthumb = True

So, in my app, the thumb and icon for the Photo class are only loaded once, on 
the original scan. Then, by looking at SQLO's output I found out this: 
http://pastebin.com/raw.php?i=Xp1TjmZ0

Basically, if you look in the previous link for Setting icon or Setting 
thumb you will see that both the icon and the thumb are being set TWICE! with 
the same data.

Now I honestly don't know where this comes from. SQLA apparently doesn't do 
this (hence, its better scanning time), and indeed it is an odd thing to do. I 
debugged a little more and found that indeed the properties for both icon and 
thumb are being called twice, though I don't know why.

Is there maybe something in the code of the PickleCol that triggers this 
behaviour?

Thanks for your help and time :)

Cheers
Juan Manuel Santos

From: Oleg Broytman p...@phd.pp.ru
To: sqlobject-discuss@lists.sourceforge.net
Date: Tuesday 23 March 2010
 On Tue, Mar 23, 2010 at 12:57:19AM -0300, Juan Manuel Santos wrote:
  Are there any ideas on why would SQLO
  be a bit slower
 
When you declare a table
 
 class MyTable(SQLObject):
...columns...
 
 and do an INSERT by calling MyTable(columns) SQLObject immediately does a
 SELECT to get back autogenerated fields (timestamps and the like) This is
 slow. It's ok to create rows like this occasionally but it is certainly bad
 for mass-insertion.
For mass-insertion use SQLBuilder. Alas, it's underdocumented. Go by
 example:
 
 record = {'column1': value1, 'column2': value2, ...}
 
 connection.query(connection.sqlrepr(
sqlbuilder.Insert(MyTable.sqlmeta.table, values=record)))
 
These are simple straightforward INSERTs without any additional
 high-level burden - no SELECT, no caching, nothing. Fire and forget.
It is not as high-level as calling MyTable() but it is still high
 enough - sqlrepr() does value conversion and quoting, e.g.
 
 Oleg.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss


Re: [SQLObject] SQLObject mass insertion

2010-04-01 Thread Oleg Broytman
On Thu, Apr 01, 2010 at 05:47:06AM -0300, Juan Manuel Santos wrote:
 http://pastebin.com/raw.php?i=Xp1TjmZ0
 
 Basically, if you look in the previous link for Setting icon or Setting 
 thumb you will see that both the icon and the thumb are being set TWICE! 
 with 
 the same data.

   There are two debugging entries for every query (see CREATE TABLE for
examples) but every query runs only once. Two debugging output are from
different methods.

Oleg.
-- 
 Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss


Re: [SQLObject] SQLObject mass insertion

2010-04-01 Thread Juan Manuel Santos
Yes, you're right and that's a bit odd. I've set the debug variable in the 
usual way:

self._conn = connectionForURI(con_str)
self._conn.debug = True

And I noticed this also:

 1/Query   :  CREATE TABLE meta_dir (
 [...]
 1/QueryR  :  CREATE TABLE meta_dir (
 [...]

Any idea why debug gets printed twice, but the second time with a capital 'r' 
after Query? :P

From: Oleg Broytman p...@phd.pp.ru
To: sqlobject-discuss@lists.sourceforge.net
Date: Thursday 01 April 2010
 On Thu, Apr 01, 2010 at 05:47:06AM -0300, Juan Manuel Santos wrote:
  http://pastebin.com/raw.php?i=Xp1TjmZ0
  
  Basically, if you look in the previous link for Setting icon or
  Setting thumb you will see that both the icon and the thumb are being
  set TWICE! with the same data.
 
There are two debugging entries for every query (see CREATE TABLE for
 examples) but every query runs only once. Two debugging output are from
 different methods.
 
 Oleg.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss


Re: [SQLObject] SQLObject mass insertion

2010-04-01 Thread Oleg Broytman
On Thu, Apr 01, 2010 at 06:18:35AM -0300, Juan Manuel Santos wrote:
  1/Query   :  CREATE TABLE meta_dir (
  [...]
  1/QueryR  :  CREATE TABLE meta_dir (
  [...]
 
 Any idea why debug gets printed twice, but the second time with a capital 'r' 
 after Query? :P
  
 Two debugging output are from
  different methods.

$ grep -F printDebug sqlobject/dbconnection.py

def _executeRetry(self, conn, cursor, query):
if self.debug:
self.printDebug(conn, query, 'QueryR')
return cursor.execute(query)

def _query(self, conn, s):
if self.debug:
self.printDebug(conn, s, 'Query')
self._executeRetry(conn, conn.cursor(), s)

def _queryAll(self, conn, s):
if self.debug:
self.printDebug(conn, s, 'QueryAll')
[skip]
if self.debugOutput:
self.printDebug(conn, value, 'QueryAll', 'result')

   ._query(), ._executeRetry(), etc can be called from different methods in
a different order so these extra debugging allows to trace the call stack.

Oleg.
-- 
 Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss


Re: [SQLObject] SQLObject mass insertion

2010-04-01 Thread Juan Manuel Santos
Got it, thanks

One last thing and maybe a bit offtopic: how do I set PRAGMA asyncrhonous = 
OFF for the SQLite connection? I'm trying to see if the speed difference is 
due to this (as I suspect that SQLA uses PRAGMA), but this:

self._conn.queryAll(PRAGMA synchronous=OFF;)

or this:

self._conn.query(PRAGMA synchronous=OFF;)

Doesn't change anything. Any ideas?

Thanks again,

Juan Manuel

From: Oleg Broytman p...@phd.pp.ru
To: sqlobject-discuss@lists.sourceforge.net
Date: Thursday 01 April 2010
 On Thu, Apr 01, 2010 at 06:18:35AM -0300, Juan Manuel Santos wrote:
   1/Query   :  CREATE TABLE meta_dir (
   [...]
   1/QueryR  :  CREATE TABLE meta_dir (
   [...]
  
  Any idea why debug gets printed twice, but the second time with a capital
  'r' after Query? :P
  
  Two debugging output are from
  
   different methods.
 
 $ grep -F printDebug sqlobject/dbconnection.py
 
 def _executeRetry(self, conn, cursor, query):
 if self.debug:
 self.printDebug(conn, query, 'QueryR')
 return cursor.execute(query)
 
 def _query(self, conn, s):
 if self.debug:
 self.printDebug(conn, s, 'Query')
 self._executeRetry(conn, conn.cursor(), s)
 
 def _queryAll(self, conn, s):
 if self.debug:
 self.printDebug(conn, s, 'QueryAll')
 [skip]
 if self.debugOutput:
 self.printDebug(conn, value, 'QueryAll', 'result')
 
._query(), ._executeRetry(), etc can be called from different methods in
 a different order so these extra debugging allows to trace the call stack.
 
 Oleg.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss


[SQLObject] PRAGMA (SQLite)

2010-04-01 Thread Oleg Broytman
 One last thing and maybe a bit offtopic:

   Offtopic for that thread, so I started a new thread on this new topic.

 how do I set PRAGMA asyncrhonous = 
 OFF for the SQLite connection? I'm trying to see if the speed difference is 
 due to this (as I suspect that SQLA uses PRAGMA), but this:
 
 self._conn.queryAll(PRAGMA synchronous=OFF;)
 
 or this:
 
 self._conn.query(PRAGMA synchronous=OFF;)
 
 Doesn't change anything. Any ideas?

   I'd recommend you first to try this from the command line. Does it
really matter for a program? If there is a difference between these modes
then I can guess the problem with SQLObject not setting the asynchronous
mode is that it sets it on a different connection. _connection is not a DB
API connection but an instance of SQLObject's DBConnection that handles a
lot of low-level connections itself. Usually .query() opens a connection,
executes the query and closes the connection; or it maintains a pool of
open connections and uses a random one from the pool. So two different
queries (PRAGMA and INSERT) could be run via two different low-level
connections.
   The simplest way to narrow the range of low-level connections is to open
a transaction and run all queries through the transaction. A transaction,
obviously, uses one DB API connection.
   The more complex way would be to override SQLite's connection factory,
but that's a long and ugly story, though it is possible and sometimes
required in SQLObject.

Oleg.
-- 
 Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss


Re: [SQLObject] Speed comparison

2010-04-01 Thread Juan Manuel Santos
I believe I fixed the speed issue. I wrapped the most expensive 
functions/calls to SQLObject (expensive in terms of I/O) in a transaction. I 
didn't realize that that's the way SQLAlchemy works, and maybe that's why it 
was being faster (I had some functions which modified several attributes one 
at a time, which resulted in several UPDATE statements).

Anyway in the end, and for the record, when wrapping the most expensive calls 
in a transaction (so they get executed all at once), there is little to no 
speed difference between SQLO and SQLA (and even SQLO turns out to be the 
faster when there's a difference). Nice tip to keep in mind :)

Thanks everybody for their help, it was truly priceless!

Cheers
Juan Manuel Santos

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss


Re: [SQLObject] Speed comparison

2010-04-01 Thread Petr Jakeš
Some examples of the code, so we all can learn from your experiences, will
be nice.

Regards

Petr

On 1 April 2010 21:30, Juan Manuel Santos vicariou...@gmail.com wrote:

 I believe I fixed the speed issue. I wrapped the most expensive
 functions/calls to SQLObject (expensive in terms of I/O) in a transaction.
 I
 didn't realize that that's the way SQLAlchemy works, and maybe that's why
 it
 was being faster (I had some functions which modified several attributes
 one
 at a time, which resulted in several UPDATE statements).

 Anyway in the end, and for the record, when wrapping the most expensive
 calls
 in a transaction (so they get executed all at once), there is little to no
 speed difference between SQLO and SQLA (and even SQLO turns out to be the
 faster when there's a difference). Nice tip to keep in mind :)

 Thanks everybody for their help, it was truly priceless!

 Cheers
 Juan Manuel Santos


 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 sqlobject-discuss mailing list
 sqlobject-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev___
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss


Re: [SQLObject] Speed comparison

2010-04-01 Thread Juan Manuel Santos
(grr it happened again ., my apologies to the guy who received this mail, it 
was intended for the list)

From: Petr Jakeš petr.ja...@tpc.cz
To: vicariou...@gmail.com
Date: Thursday 01 April 2010
 Some examples of the code, so we all can learn from your experiences, will
 be nice.
 
 Regards
 
 Petr
 

Well, for this particular case, I was using a factory class that was in charge 
of creating the several objects that were required. It worked more or less 
like so: there is an indexer class, which takes care of crawling the 
filesystem. For every file or directory it encounters, it passes a series of 
parameters to the factory class (or object ;)), which the factory uses to 
instantiate the object that represents said file/dir. Yes, you may say that 
has too much coupling, but given the handling that one must do regarding 
connections and transactions (and taking care not to use them in a different 
thread than the one they were created on), it seemed the easiest and more or 
less right way to do it.

So, let's say that there's one method in this Factory class/object to 
instantiate each of the objects that can be scanned from the filesystem 
(regular file, directory, image/video/audio file). Instantiation in itself 
isn't too expensive (or at least there isn't much that you can do about it). 
But when it comes to extracting the metadata for some particular files 
(namely, audio, video and image files) I found that I was setting fields on 
the object one at a time. This caused several UPDATE statements, which slowed 
everything down.

So, in order to alleviate this, I did the following:

def new_photo(self, parent, name, relpath, mimetype, atime, mtime,
  root, size, strsize, strabs):
trans = self._conn.transaction()
atime, mtime = self._check_stat(atime, mtime)
mimetype = mimetype[0]
photo = Photo(parent = parent, name = name, relpath = relpath,
  mimetype = mimetype, atime = atime, mtime = mtime,
  size = size, strsize = strsize, root = root,
  isdir = False, strabs = strabs,
  connection = trans)
self._get_photo_metadata(photo)
trans.commit(close = True)
return photo

The photo (I didn't use Image as a name so I wouldn't overwrite some other 
class already defined within Python) was the heaviest class in this regard, 
since it extracted two scaled images from the original (that's just something 
the pogram does, you can try it out if you want to). This extraction is done 
in the _get_photo_metadata method. So I got a transaction from the regular 
SQLObject connection, and used it as a parameter at object creation time, 
instead of the connection. After that I commit it with the close parameter on 
True, since then the method ends and I don't want that transaction lying 
around so that after working I still have a journal file for the SQLite db :P.

Hope it is useful for somebody in the future. Don't forget that the 
transaction object you get from the connection.transaction() method, is 
already begun. You just have to work directly with it and DON'T FORGET to 
close it :) (via commit or rollback).

Thanks for the help people.

Cheers
Juan Manuel Santos

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss