Why not simply use a set? uniquePoints = set(zip(lats, lons))
Ben Root On Wed, Jun 2, 2010 at 2:41 AM, Vincent Schut <sc...@sarvision.nl> wrote: > On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote: > > On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincus<zachary.pin...@yale.edu> > wrote: > >>> I guess it's as fast as I'm going to get. I don't really see any > >>> other way. BTW, the lat/lons are integers) > >> > >> You could (in c or cython) try a brain-dead "hashtable" with no > >> collision detection: > >> > >> for lat, long, data in dataset: > >> bin = (lat ^ long) % num_bins > >> hashtable[bin] = update_incremental_mean(hashtable[bin], data) > >> > >> you'll of course want to do some experiments to see if your data are > >> sufficiently sparse and/or you can afford a large enough hashtable > >> array that you won't get spurious hash collisions. Adding error- > >> checking to ensure that there are no collisions would be pretty > >> trivial (just keep a table of the lat/long for each hash value, which > >> you'll need anyway, and check that different lat/long pairs don't get > >> assigned the same bin). > >> > >> Zach > >> > >> > >> > >>> -Mathew > >>> > >>> On Tue, Jun 1, 2010 at 1:49 PM, Zachary Pincus<zachary.pin...@yale.edu > >>>> wrote: > >>>> Hi > >>>> Can anyone think of a clever (non-lopping) solution to the > >>> following? > >>>> > >>>> A have a list of latitudes, a list of longitudes, and list of data > >>>> values. All lists are the same length. > >>>> > >>>> I want to compute an average of data values for each lat/lon pair. > >>>> e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then > >>>> data[1001] = (data[1001] + data[2001])/2 > >>>> > >>>> Looping is going to take wayyyy to long. > >>> > >>> As a start, are the "equal" lat/lon pairs exactly equal (i.e. either > >>> not floating-point, or floats that will always compare equal, that is, > >>> the floating-point bit-patterns will be guaranteed to be identical) or > >>> approximately equal to float tolerance? > >>> > >>> If you're in the approx-equal case, then look at the KD-tree in scipy > >>> for doing near-neighbors queries. > >>> > >>> If you're in the exact-equal case, you could consider hashing the lat/ > >>> lon pairs or something. At least then the looping is O(N) and not > >>> O(N^2): > >>> > >>> import collections > >>> grouped = collections.defaultdict(list) > >>> for lt, ln, da in zip(lat, lon, data): > >>> grouped[(lt, ln)].append(da) > >>> > >>> averaged = dict((ltln, numpy.mean(da)) for ltln, da in > >>> grouped.items()) > >>> > >>> Is that fast enough? > > > > If the lat lon can be converted to a 1d label as Wes suggested, then > > in a similar timing exercise ndimage was the fastest. > > http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html > > And as you said your lats and lons are integers, you could simply do > > ll = lat*1000 + lon > > to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat or > lon will never exceed 360 (degrees). > > After that, either use the ndimage approach, or you could use > histogramming with weighting by data values and divide by histogram > withouth weighting, or just loop. > > Vincent > > > > > (this was for python 2.4, also later I found np.bincount which > > requires that the labels are consecutive integers, but is as fast as > > ndimage) > > > > I don't know how it would compare to the new suggestions. > > > > Josef > > > > > > > >>> > >>> Zach > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion@scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion@scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion@scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion