I'm on Windows, using a precompiled binary. I never built numpy/scipy on Windows.
On Wed, Jun 2, 2010 at 10:45 AM, Wes McKinney <wesmck...@gmail.com> wrote: > On Wed, Jun 2, 2010 at 1:23 PM, Mathew Yeates <mat.yea...@gmail.com> > wrote: > > thanks. I am also getting an error in ndi.mean > > Were you getting the error > > "RuntimeError: data type not supported"? > > > > -Mathew > > > > On Wed, Jun 2, 2010 at 9:40 AM, Wes McKinney <wesmck...@gmail.com> > wrote: > >> > >> On Wed, Jun 2, 2010 at 3:41 AM, Vincent Schut <sc...@sarvision.nl> > wrote: > >> > On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote: > >> >> On Tue, Jun 1, 2010 at 9:57 PM, Zachary Pincus< > zachary.pin...@yale.edu> > >> >> wrote: > >> >>>> I guess it's as fast as I'm going to get. I don't really see any > >> >>>> other way. BTW, the lat/lons are integers) > >> >>> > >> >>> You could (in c or cython) try a brain-dead "hashtable" with no > >> >>> collision detection: > >> >>> > >> >>> for lat, long, data in dataset: > >> >>> bin = (lat ^ long) % num_bins > >> >>> hashtable[bin] = update_incremental_mean(hashtable[bin], data) > >> >>> > >> >>> you'll of course want to do some experiments to see if your data are > >> >>> sufficiently sparse and/or you can afford a large enough hashtable > >> >>> array that you won't get spurious hash collisions. Adding error- > >> >>> checking to ensure that there are no collisions would be pretty > >> >>> trivial (just keep a table of the lat/long for each hash value, > which > >> >>> you'll need anyway, and check that different lat/long pairs don't > get > >> >>> assigned the same bin). > >> >>> > >> >>> Zach > >> >>> > >> >>> > >> >>> > >> >>>> -Mathew > >> >>>> > >> >>>> On Tue, Jun 1, 2010 at 1:49 PM, Zachary > >> >>>> Pincus<zachary.pin...@yale.edu > >> >>>>> wrote: > >> >>>>> Hi > >> >>>>> Can anyone think of a clever (non-lopping) solution to the > >> >>>> following? > >> >>>>> > >> >>>>> A have a list of latitudes, a list of longitudes, and list of data > >> >>>>> values. All lists are the same length. > >> >>>>> > >> >>>>> I want to compute an average of data values for each lat/lon > pair. > >> >>>>> e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then > >> >>>>> data[1001] = (data[1001] + data[2001])/2 > >> >>>>> > >> >>>>> Looping is going to take wayyyy to long. > >> >>>> > >> >>>> As a start, are the "equal" lat/lon pairs exactly equal (i.e. > either > >> >>>> not floating-point, or floats that will always compare equal, that > >> >>>> is, > >> >>>> the floating-point bit-patterns will be guaranteed to be identical) > >> >>>> or > >> >>>> approximately equal to float tolerance? > >> >>>> > >> >>>> If you're in the approx-equal case, then look at the KD-tree in > scipy > >> >>>> for doing near-neighbors queries. > >> >>>> > >> >>>> If you're in the exact-equal case, you could consider hashing the > >> >>>> lat/ > >> >>>> lon pairs or something. At least then the looping is O(N) and not > >> >>>> O(N^2): > >> >>>> > >> >>>> import collections > >> >>>> grouped = collections.defaultdict(list) > >> >>>> for lt, ln, da in zip(lat, lon, data): > >> >>>> grouped[(lt, ln)].append(da) > >> >>>> > >> >>>> averaged = dict((ltln, numpy.mean(da)) for ltln, da in > >> >>>> grouped.items()) > >> >>>> > >> >>>> Is that fast enough? > >> >> > >> >> If the lat lon can be converted to a 1d label as Wes suggested, then > >> >> in a similar timing exercise ndimage was the fastest. > >> >> http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html > >> > > >> > And as you said your lats and lons are integers, you could simply do > >> > > >> > ll = lat*1000 + lon > >> > > >> > to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat > or > >> > lon will never exceed 360 (degrees). > >> > > >> > After that, either use the ndimage approach, or you could use > >> > histogramming with weighting by data values and divide by histogram > >> > withouth weighting, or just loop. > >> > > >> > Vincent > >> > > >> >> > >> >> (this was for python 2.4, also later I found np.bincount which > >> >> requires that the labels are consecutive integers, but is as fast as > >> >> ndimage) > >> >> > >> >> I don't know how it would compare to the new suggestions. > >> >> > >> >> Josef > >> >> > >> >> > >> >> > >> >>>> > >> >>>> Zach > >> >>>> _______________________________________________ > >> >>>> NumPy-Discussion mailing list > >> >>>> NumPy-Discussion@scipy.org > >> >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >>>> > >> >>>> _______________________________________________ > >> >>>> NumPy-Discussion mailing list > >> >>>> NumPy-Discussion@scipy.org > >> >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >>> > >> >>> _______________________________________________ > >> >>> NumPy-Discussion mailing list > >> >>> NumPy-Discussion@scipy.org > >> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >>> > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion@scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > >> I was curious about how fast ndimage was for this operation so here's > >> the complete function. > >> > >> import scipy.ndimage as ndi > >> > >> N = 10000 > >> > >> lat = np.random.randint(0, 360, N) > >> lon = np.random.randint(0, 360, N) > >> data = np.random.randn(N) > >> > >> def group_mean(lat, lon, data): > >> indexer = np.lexsort((lon, lat)) > >> lat = lat.take(indexer) > >> lon = lon.take(indexer) > >> sorted_data = data.take(indexer) > >> > >> keys = 1000 * lat + lon > >> unique_keys = np.unique(keys) > >> > >> result = ndi.mean(sorted_data, labels=keys, index=unique_keys) > >> decoder = keys.searchsorted(unique_keys) > >> > >> return dict(zip(zip(lat.take(decoder), lon.take(decoder)), result)) > >> > >> Appears to be about 13x faster (and could be made faster still) than > >> the naive version on my machine: > >> > >> def group_mean_naive(lat, lon, data): > >> grouped = collections.defaultdict(list) > >> for lt, ln, da in zip(lat, lon, data): > >> grouped[(lt, ln)].append(da) > >> > >> averaged = dict((ltln, np.mean(da)) for ltln, da in grouped.items()) > >> > >> return averaged > >> > >> I had to get the latest scipy trunk to not get an error from > ndimage.mean > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion@scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > That's the error I was getting. Depending on your OS upgrading to the > scipy trunk should be the easiest fix. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion