David Goldsmith wrote: > Tim Hochberg wrote: > >> I periodically need to perform various linalg operations on a large >> number of small matrices. The normal numpy approach of stacking up the >> data into an extra dimension and doing the operations in parallel >> doesn't work because the linalg functions are only designed to work on >> one matrix at a time. At first glance, this restriction seems harmless >> enough since the underlying LAPACK functions are also designed to work >> one matrix at a time and it seem that there would be little to be gained >> by parallelizing the Python level code. However, this turns out to be >> the case. >> >> I have some code, originally written for numarray, but I recently ported >> it over to numpy, that rewrites the linalg function determinant, inverse >> and solve_linear_equations[1] to operate of 2 or 3D arrays, treating 3D >> arrays as stacks of 2D arrays. For operating on large numbers of small >> arrays (for example 1000 2x2 arrays), this approach is over an order of >> magnitude faster than the obvious map(inverse, threeDArray) approach. >> >> So, some questions: >> >> 1. Is there any interest in this code for numpy.linalg? I'd be >> willing to clean it up and work on the other functions in linalg so that >> they were all consistent. Since these changes should all be backwards >> compatible (unless your counting on catching an error from one of the >> linalg functions), I don't see this as a problem to put in after 1.0, so >> there's really no rush on this. I'm only bringing it up now since I just >> ported it over to numpy and it's fresh in my mind. >> >> > Say "no" to someone offering to make a Python module more feature rich? > Blasphemy! :-) > > But I am very curious as to the source/basis of the performance > improvement - did you figure this out? > Yes and no. Here's what the begining of linalg.det looks lik:
a = asarray(a) _assertRank2(a) _assertSquareness(a) t, result_t = _commonType(a) a = _fastCopyAndTranspose(t, a) n = a.shape[0] if isComplexType(t): lapack_routine = lapack_lite.zgetrf else: lapack_routine = lapack_lite.dgetrf # ... There's quite a few function calls here, since the actual call to dgetrf probably takes neglible time for a 2x2 array, the cost of computation here is dominated by function call overhead, creating the return arrays and such not. Which is the biggest culprit, I'm not sure; by vectorizing it, much of that overhead is only incurred once rather than hundreds or thousands of times, so it all becomes negligible at that point. I didn't bother to track down the worst offender. -tim ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion