Hi All, I have a problem involving lat/lon data. Basically, I am evaluating numerical weather model data against satellite data, and trying to produce gridded plots of various statistics. There are various steps involved with this, but basically, I get to the point where I have four arrays of the same length, containing lat, lon, model and observed values respectively along the path of the sattelite.
eg: lat = [ 50.32 50.78 51.24 51.82 52.55 53.15 53.75 54.28 54.79 55.16 ... ] lon = [ 192.83 193.38 193.94 194.67 195.65 196.49 197.35 198.15 198.94 199.53 ... ] obs = [ 1.42 1.35 1.55 1.50 1.59 1.76 2.15 1.90 1.55 0.73 ... ] model = [ 1.67 1.68 1.70 1.79 2.04 2.36 2.53 2.38 2.149 1.57 ... ] I then want to calculate statistics based on bins of say 2 X 2 degree boxes. These arrays are very large, on the order of 10^6. For ease of explanation, I will focus initially on bias. My first approach was to use loops through lat/lon boxes and use np.where statements to extract all the values of the model and observations within the given box, and calculate the bias as the mean of the difference. This was obviously very slow. histogram2d provided a FAR better way to do this. i.e. import numpy as np latedges=np.arange(-90,90,2) lonedges=np.arange(0,360,2) diff = model-obs grid_N, xedges, yedges = np.histogram2d(lat, lon], bins=(latedges,lonedges)) grid_bias_sum, xedges, yedges = np.histogram2d(lat, lon, weights=diff, bins=(latedges,lonedges)) grid_bias = grid_bias_sum/grid_N I now want to determine the the linear regression coefficients for each each box after fitting a least squares linear regression to the data in each bin. I have been looking at using np.digitize to extract the bin indeces, but haven't had much success trying to do this in two dimensions. I am back to looping through the lat and lon box values, using np.where to extract the observations and model data within that box, and using np.polyfit to obtain the regression coefficients. This is, of course, very slow. Can anyone advise a smarter, vectorized way to do this? Any thoughts would be greatly appreciated. Thanks in advance Tom
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion