On 11/14/06, Tim Hochberg <[EMAIL PROTECTED]> wrote: > Tim Hochberg wrote: > > John Hunter wrote: > > > >>>>>>> "Erin" == Erin Sheldon <[EMAIL PROTECTED]> writes: > >>>>>>> > >>>>>>> > >> Erin> The question I have been asking myself is "what is the > >> Erin> advantage of such an approach?". It would be faster, but by > >> > >> In the use case that prompted this message, the pull from mysql took > >> almost 3 seconds, and the conversion from lists to numpy arrays took > >> more that 4 seconds. We have a list of about 500000 2 tuples of > >> floats. > >> > >> > > I'm no database user, but a glance at the at the docs seems to indicate > > that you can get your data via an iterator (by iterating over the cursor > > or some such db mumbo jumbo) rather than slurping up the whole list up > > at once. If so, then you'll save a lot of memory by passing the iterator > > straight to fromiter. It may even be faster, who knows. > > > > Accessing the db via the iterator could be a performance killer, but > > it's almost certainly worth trying as it could a few megabytes of > > storage and that in turn might speed things up. > > > > > Assuming that I didn't mess this up too badly, it appears that using the > iterator directly with fromiter is significantly faster than the next > best solution (about 45%). The fromiter wrapping a list solution come in > second, followed by numarray.array and finally way in the back, > numpy.array. Here's the numbers: > > retrieve1 took 0.902922857514 seconds > retrieve2 took 1.31245870634 seconds > retrieve3 took 1.51207569677 seconds > retrieve4 took 8.71539930354 seconds > >
Interesting results Tim. From Pierre's results we saw that fromiter is the fastest way to get data into arrays. With your results we see there is a difference between iterating over the cursor and doing a fetchall() as well. Surprisingly, running the cursor is faster. This must come not from the data retrieval rate but from creating the copies in memory. But just in case I think there is one more thing to check. I haven't used sqlite, but with other databases I have used there is often a large variance in times from one select to the next. Can you repeat these tests with a timeit().repeat and give the minimum? As an aside, your database is running on a local disk, right, so the overehead of retrieving data is minimized here? For my tests I think I am data retrieval limited because I get exactly the same time for the equivalent of retrieve1 and retrieve2. Erin ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion