Hi all,
I haven't been on the list very long but I see this question keeps coming
up. I just thought I put some numbers out so people know. Graph tool is
super fast at doing the c++ stuff it does, and super convenient for picking
through the data on a python level also, but it doesn't make python fast.
So, here are some examples of working with a mask in Python. Notice that
numpy is not fast just because it is numpy. I chose an Nx3 array because
numpy masking is extra slow with shapes other than Nx1.
================
import numpy as np
a_generator = ((x,x+2,x*x) for x in range(1000))
a_list =[(x,x+2,x*x) for x in range(1000)]
a_array = np.array(a_list)
mask = np.ones((len(a_array)),dtype=np.bool)
mask[::3] = False
def tupled():
for x,b in zip(a_generator,mask):
if b:
c,d,e =x
def looped():
for x,b in zip(a_list,mask):
if b:
c,d,e =x
def masked():
for x in a_array[mask]:
c,d,e = x
#IPython magic function: timeit
%timeit tupled()
1000000 loops, best of 3: 510 ns per loop
%timeit looped()
10000 loops, best of 3: 76.6 µs per loop
%timeit masked()
1000 loops, best of 3: 445 µs per loop
================
Notice the nano seconds vs micro seconds. The generator is the clear
winner.
Now, here are some graph tool specific examples. I left out the mask, but
clearly you can create and manipulate a mask as you see fit.
=================
>>>graph
<Graph object, directed, reversed, with 32183 vertices and 199381 edges
at 0x7f62842ee9d0>
>>>def graph_loop0():
for i in range(10): #Only 10 times because this is soooo slooow
for e in graph.edges():
v1,v2 = e.source(),e.target()
>>>def graph_loop1():
edges = [[e.source(),e.target()] for e in graph.edges()]
for i in range(1000):
for e in edges:
v1,v2 = e[0],e[1]
>>>def graph_loop2():
edges = [[e.source(),e.target()] for e in graph.edges()]
gen = (e for e in edges)
for x in range(1000):
for e in gen:
v1,v2 = e[0],e[1]
>>>from time import time
>>>a=time(); graph_loop0(); print time()-a
23.1095559597
>>>a=time(); graph_loop1(); print time()-a
15.721350193
>>>a=time(); graph_loop2(); print time()-a
2.80044198036
=======================
The loop1 and loop2 are doing 100x more, so the generator is 1000x faster.
If you're just looping through once, it makes sense to use the convenience
graph_tool provides. But if you are implementing a graph algorithm, just
grab what you need from the graph_tool graph into a list or whatever python
object makes sense for what you're doing.
-Elliot
_______________________________________________
graph-tool mailing list
[email protected]
http://lists.skewed.de/mailman/listinfo/graph-tool