On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith <[email protected]> wrote: > I think we need to > know something about how often the Mat @ Mat @ vec type cases arise in > practice. How often do non-scalar * and np.dot show up in the same > expression? How often does it look like a * np.dot(b, c), and how often does > it look like np.dot(a * b, c)? How often do we see expressions like > np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a, > np.dot(b, c))? This would really help guide the debate. I don't have this > data, and I'm not sure the best way to get it. A super-fancy approach would > be to write a little script that uses the 'ast' module to count things > automatically. A less fancy approach would be to just pick some code you've > written, or a well-known package, grep through for calls to 'dot', and make > notes on what you see. (An advantage of the less-fancy approach is that as a > human you might be able to tell the difference between scalar and non-scalar > *, or check whether it actually matters what order the 'dot' calls are done > in.)
Okay, I wrote a little script [1] to scan Python source files look for things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot method equivalents. So what we get out is: - a count of how many 'dot' calls there are - a count of how often we see left-associative nestings: dot(dot(a, b), c) - a count of how often we see right-associative nestings: dot(a, dot(b, c)) Running it on a bunch of projects, I get: | project | dots | left | right | right/left | |--------------+------+------+-------+------------| | scipy | 796 | 53 | 27 | 0.51 | | nipy | 275 | 3 | 19 | 6.33 | | scikit-learn | 472 | 11 | 10 | 0.91 | | statsmodels | 803 | 46 | 38 | 0.83 | | astropy | 17 | 0 | 0 | nan | | scikit-image | 15 | 1 | 0 | 0.00 | |--------------+------+------+-------+------------| | total | 2378 | 114 | 94 | 0.82 | (Any other projects worth trying? This is something that could vary a lot between different projects, so it seems more important to get lots of projects here than to get a few giant projects. Or if anyone wants to run the script on their own private code, please do! Running it on my personal pile of random junk finds 3 left-associative and 1 right.) Two flaws with this approach: 1) Probably some proportion of those nested dot calls are places where it doesn't actually matter which evaluation order one uses -- dot() forces you to pick one, so you have to. If people prefer to, say, use the "left" form in cases where it doesn't matter, then this could bias the left-vs-right results -- hard to say. (Somewhere in this thread it was suggested that the use of the .dot method could create such a bias, because a.dot(b).dot(c) is more natural than a.dot(b.dot(c)), but only something like 6% of the dot calls here use the method form, so this probably doesn't matter.) OTOH, this also means that the total frequency of @ expressions where associativity even matters at all is probably *over*-estimated by the above. 2) This approach misses cases where the cumbersomeness of dot has caused people to introduce temporary variables, like 'foo = np.dot(a, b); bar = np.dot(foo, c)'. So this causes us to *under*-estimate how often associativity matters. I did read through the 'dot' uses in scikit-learn and nipy, though, and only caught a handful of such cases, so I doubt it changes anything much. -n [1] https://gist.github.com/njsmith/9157645#file-grep-dot-dot-py -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
