On Thu, Jul 10, 2008 at 2:55 PM, Anne Archibald <[EMAIL PROTECTED]> wrote:
> 2008/7/10 Dan Lussier <[EMAIL PROTECTED]>: > > > I am relatively new to numpy and am having trouble with the speed of > > a specific array based calculation that I'm trying to do. > > > > What I'm trying to do is to calculate the total total potential > > energy and coordination number of each atom within a relatively large > > simulation. Each atom is at a position (x,y,z) given by a row in a > > large array (approximately 1e6 by 3) and presently I have no > > information about its nearest neighbours so each its position must be > > checked against all others before cutting the list down prior to > > calculating the energy. > > The way you've implemented this goes as the square of the number of > atoms. This is going to absolutely kill performance, and you can spend > weeks trying to optimize the code for a factor of two speedup. I would > look hard for algorithms that do this in less than O(n^2). > > This problem of finding the neighbours of an object has seen > substantial research, and there are a variety of well-established > techniques covering many possible situations. My knowledge of it is > far from up-to-date, but since you have only a three-dimensional > problem, you could try a three-dimensional grid (if your atoms aren't > too clumpy) or octrees (if they are somewhat clumpy); kd-trees are > probably overkill (since they're designed for higher-dimensional > problems). > > > My current numpy code is below and in it I have tried to push as much > > of the work for this computation into compiled C (ideally) of numpy. > > However it is still taking a very long time at approx 1 second per > > row. At this speed even some simple speed ups like weave won't > > really help. > > > > Are there any other numpy ways that it would be possible to calculate > > this, or should I start looking at going to C/C++? > > > > I am also left wondering if there is something wrong with my > > installation of numpy (i.e. not making use of lapack/ATLAS). Other > > than that if it is relevant - I am running 32 bit x86 Centos 5.1 > > linux on a dual Xeon 3.0 GHz Dell tower with 4 GB of memory. > > Unfortunately, implementing most of the algorithms in the literature > within numpy is going to be fairly cumbersome. But I think there are > better algorithms you could try: > > * Put all your atoms in a grid. Ideally the cells would be about the > size of your cutoff radius, so that for each atom you need to pull out > all atoms in at most eight cells for checking against the cutoff. I > think this can be done fairly efficiently in numpy. > > * Sort the atoms along a coordinate and use searchsorted to pull out > those for which that coordinate is within the cutoff. This should get > you down to reasonably modest lists fairly quickly. > > There is of course a tradeoff between preprocessing time and lookup time. > > We seem to get quite a few posts from people wanting some kind of > spatial data structure (whether they know it or not). Would it make > sense to come up with some kind of compiled spatial data structure to > include in scipy? > I think there should be a "computer science" module containing such things as r-b trees, k-d trees, equivalence relations, and so forth. For this problem one could probably make a low resolution cube of list objects and index atoms into the appropriate list by severe rounding. I have done similar things for indexing stars in a (fairly) wide field image and it worked well. But I think the sorting approach would be a good first try here. Argsort on the proper column followed by take would be the way to go for that. Chuck
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion