Tim Hochberg wrote: > John Hunter wrote: > >>>>>>> "Erin" == Erin Sheldon <[EMAIL PROTECTED]> writes: >>>>>>> >>>>>>> >> Erin> The question I have been asking myself is "what is the >> Erin> advantage of such an approach?". It would be faster, but by >> >> In the use case that prompted this message, the pull from mysql took >> almost 3 seconds, and the conversion from lists to numpy arrays took >> more that 4 seconds. We have a list of about 500000 2 tuples of >> floats. >> >> > I'm no database user, but a glance at the at the docs seems to indicate > that you can get your data via an iterator (by iterating over the cursor > or some such db mumbo jumbo) rather than slurping up the whole list up > at once. If so, then you'll save a lot of memory by passing the iterator > straight to fromiter. It may even be faster, who knows. > > Accessing the db via the iterator could be a performance killer, but > it's almost certainly worth trying as it could a few megabytes of > storage and that in turn might speed things up. > > Assuming that I didn't mess this up too badly, it appears that using the iterator directly with fromiter is significantly faster than the next best solution (about 45%). The fromiter wrapping a list solution come in second, followed by numarray.array and finally way in the back, numpy.array. Here's the numbers:
retrieve1 took 0.902922857514 seconds retrieve2 took 1.31245870634 seconds retrieve3 took 1.51207569677 seconds retrieve4 took 8.71539930354 seconds And here is the code: import sqlite3, numpy as np, numarray as na, time N = 500000 def setup(conn): c = conn.cursor() c.execute('''create table demo (x real, y real)''') data = np.random.rand(N, 2) c.executemany("""insert into demo values (?, ?)""", data) def retrieve1(conn): c = conn.cursor() c.execute('select * from demo') y = np.fromiter(c, dtype=[('a',float), ('b', float)]) return y def retrieve2(conn): c = conn.cursor() c.execute('select * from demo') y = np.fromiter(c.fetchall(), dtype=[('a',float), ('b', float)]) return y def retrieve3(conn): c = conn.cursor() c.execute('select * from demo') y = na.array(c.fetchall()) return y def retrieve4(conn): c = conn.cursor() c.execute('select * from demo') y = np.array(c.fetchall()) return y conn = sqlite3.connect(':memory:') setup(conn) t0 = time.clock() y1 = retrieve1(conn) t1 = time.clock() y2 = retrieve2(conn) t2 = time.clock() y3 = retrieve3(conn) t3 = time.clock() y4 = retrieve4(conn) t4 = time.clock() assert y1.shape == y2.shape == y3.shape[:1] == y4.shape[:1] == (N,) assert np.alltrue(y1 == y2) print "retrieve1 took", t1-t0, "seconds" print "retrieve2 took", t2-t1, "seconds" print "retrieve3 took", t3-t2, "seconds" print "retrieve4 took", t4-t3, "seconds" ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion