On Wed, Jun 2, 2010 at 2:26 PM, <josef.p...@gmail.com> wrote: > On Wed, Jun 2, 2010 at 2:09 PM, Mathew Yeates <mat.yea...@gmail.com> wrote: >> I'm on Windows, using a precompiled binary. I never built numpy/scipy on >> Windows. > > ndimage measurements has been recently rewritten. ndimage is very fast > but (the old version) has insufficient type checking and may crash on > wrong inputs. > > I managed to work with it in the past on Windows. Maybe you could try > to check the dtypes of the arguments for ndi.mean. (Preferably in an > interpreter session where you don't mind if it crashes) > > I don't remember the type restrictions, but there are/were several > tickets for it. > > Josef > > >> >> On Wed, Jun 2, 2010 at 10:45 AM, Wes McKinney <wesmck...@gmail.com> wrote: >>> >>> On Wed, Jun 2, 2010 at 1:23 PM, Mathew Yeates <mat.yea...@gmail.com> >>> wrote: >>> > thanks. I am also getting an error in ndi.mean >>> > Were you getting the error >>> > "RuntimeError: data type not supported"? >>> > >>> > -Mathew >>> > >>> > On Wed, Jun 2, 2010 at 9:40 AM, Wes McKinney <wesmck...@gmail.com> >>> > wrote: >>> >> >>> >> On Wed, Jun 2, 2010 at 3:41 AM, Vincent Schut <sc...@sarvision.nl> >>> >> wrote: >>> >> > On 06/02/2010 04:52 AM, josef.p...@gmail.com wrote: >>> >> >> On Tue, Jun 1, 2010 at 9:57 PM, Zachary >>> >> >> Pincus<zachary.pin...@yale.edu> >>> >> >> wrote: >>> >> >>>> I guess it's as fast as I'm going to get. I don't really see any >>> >> >>>> other way. BTW, the lat/lons are integers) >>> >> >>> >>> >> >>> You could (in c or cython) try a brain-dead "hashtable" with no >>> >> >>> collision detection: >>> >> >>> >>> >> >>> for lat, long, data in dataset: >>> >> >>> bin = (lat ^ long) % num_bins >>> >> >>> hashtable[bin] = update_incremental_mean(hashtable[bin], data) >>> >> >>> >>> >> >>> you'll of course want to do some experiments to see if your data >>> >> >>> are >>> >> >>> sufficiently sparse and/or you can afford a large enough hashtable >>> >> >>> array that you won't get spurious hash collisions. Adding error- >>> >> >>> checking to ensure that there are no collisions would be pretty >>> >> >>> trivial (just keep a table of the lat/long for each hash value, >>> >> >>> which >>> >> >>> you'll need anyway, and check that different lat/long pairs don't >>> >> >>> get >>> >> >>> assigned the same bin). >>> >> >>> >>> >> >>> Zach >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>>> -Mathew >>> >> >>>> >>> >> >>>> On Tue, Jun 1, 2010 at 1:49 PM, Zachary >>> >> >>>> Pincus<zachary.pin...@yale.edu >>> >> >>>>> wrote: >>> >> >>>>> Hi >>> >> >>>>> Can anyone think of a clever (non-lopping) solution to the >>> >> >>>> following? >>> >> >>>>> >>> >> >>>>> A have a list of latitudes, a list of longitudes, and list of >>> >> >>>>> data >>> >> >>>>> values. All lists are the same length. >>> >> >>>>> >>> >> >>>>> I want to compute an average of data values for each lat/lon >>> >> >>>>> pair. >>> >> >>>>> e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then >>> >> >>>>> data[1001] = (data[1001] + data[2001])/2 >>> >> >>>>> >>> >> >>>>> Looping is going to take wayyyy to long. >>> >> >>>> >>> >> >>>> As a start, are the "equal" lat/lon pairs exactly equal (i.e. >>> >> >>>> either >>> >> >>>> not floating-point, or floats that will always compare equal, that >>> >> >>>> is, >>> >> >>>> the floating-point bit-patterns will be guaranteed to be >>> >> >>>> identical) >>> >> >>>> or >>> >> >>>> approximately equal to float tolerance? >>> >> >>>> >>> >> >>>> If you're in the approx-equal case, then look at the KD-tree in >>> >> >>>> scipy >>> >> >>>> for doing near-neighbors queries. >>> >> >>>> >>> >> >>>> If you're in the exact-equal case, you could consider hashing the >>> >> >>>> lat/ >>> >> >>>> lon pairs or something. At least then the looping is O(N) and not >>> >> >>>> O(N^2): >>> >> >>>> >>> >> >>>> import collections >>> >> >>>> grouped = collections.defaultdict(list) >>> >> >>>> for lt, ln, da in zip(lat, lon, data): >>> >> >>>> grouped[(lt, ln)].append(da) >>> >> >>>> >>> >> >>>> averaged = dict((ltln, numpy.mean(da)) for ltln, da in >>> >> >>>> grouped.items()) >>> >> >>>> >>> >> >>>> Is that fast enough? >>> >> >> >>> >> >> If the lat lon can be converted to a 1d label as Wes suggested, then >>> >> >> in a similar timing exercise ndimage was the fastest. >>> >> >> http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html >>> >> > >>> >> > And as you said your lats and lons are integers, you could simply do >>> >> > >>> >> > ll = lat*1000 + lon >>> >> > >>> >> > to get unique 'hashes' or '1d labels' for you latlon pairs, as a lat >>> >> > or >>> >> > lon will never exceed 360 (degrees). >>> >> > >>> >> > After that, either use the ndimage approach, or you could use >>> >> > histogramming with weighting by data values and divide by histogram >>> >> > withouth weighting, or just loop. >>> >> > >>> >> > Vincent >>> >> > >>> >> >> >>> >> >> (this was for python 2.4, also later I found np.bincount which >>> >> >> requires that the labels are consecutive integers, but is as fast as >>> >> >> ndimage) >>> >> >> >>> >> >> I don't know how it would compare to the new suggestions. >>> >> >> >>> >> >> Josef >>> >> >> >>> >> >> >>> >> >> >>> >> >>>> >>> >> >>>> Zach >>> >> >>>> _______________________________________________ >>> >> >>>> NumPy-Discussion mailing list >>> >> >>>> NumPy-Discussion@scipy.org >>> >> >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >>>> >>> >> >>>> _______________________________________________ >>> >> >>>> NumPy-Discussion mailing list >>> >> >>>> NumPy-Discussion@scipy.org >>> >> >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >>> >>> >> >>> _______________________________________________ >>> >> >>> NumPy-Discussion mailing list >>> >> >>> NumPy-Discussion@scipy.org >>> >> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >>> >>> >> > >>> >> > _______________________________________________ >>> >> > NumPy-Discussion mailing list >>> >> > NumPy-Discussion@scipy.org >>> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> > >>> >> >>> >> I was curious about how fast ndimage was for this operation so here's >>> >> the complete function. >>> >> >>> >> import scipy.ndimage as ndi >>> >> >>> >> N = 10000 >>> >> >>> >> lat = np.random.randint(0, 360, N) >>> >> lon = np.random.randint(0, 360, N) >>> >> data = np.random.randn(N) >>> >> >>> >> def group_mean(lat, lon, data): >>> >> indexer = np.lexsort((lon, lat)) >>> >> lat = lat.take(indexer) >>> >> lon = lon.take(indexer) >>> >> sorted_data = data.take(indexer) >>> >> >>> >> keys = 1000 * lat + lon >>> >> unique_keys = np.unique(keys) >>> >> >>> >> result = ndi.mean(sorted_data, labels=keys, index=unique_keys) >>> >> decoder = keys.searchsorted(unique_keys) >>> >> >>> >> return dict(zip(zip(lat.take(decoder), lon.take(decoder)), result)) >>> >> >>> >> Appears to be about 13x faster (and could be made faster still) than >>> >> the naive version on my machine: >>> >> >>> >> def group_mean_naive(lat, lon, data): >>> >> grouped = collections.defaultdict(list) >>> >> for lt, ln, da in zip(lat, lon, data): >>> >> grouped[(lt, ln)].append(da) >>> >> >>> >> averaged = dict((ltln, np.mean(da)) for ltln, da in grouped.items()) >>> >> >>> >> return averaged >>> >> >>> >> I had to get the latest scipy trunk to not get an error from >>> >> ndimage.mean >>> >> _______________________________________________ >>> >> NumPy-Discussion mailing list >>> >> NumPy-Discussion@scipy.org >>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> > >>> >>> That's the error I was getting. Depending on your OS upgrading to the >>> scipy trunk should be the easiest fix. >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
If you're on Python 2.6 the binary on here might work for you: http://www.lfd.uci.edu/~gohlke/pythonlibs/ It looks recent enough to have the rewritten ndimage _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion