Re: [Numpy-discussion] effectively computing variograms with numpy

Timothy Hochberg Fri, 22 Jun 2007 11:05:23 -0700

OK, generally in code like this I leave the outer loops alone and try to
vectorize just the inner loop.I have some ideas in this direction, but
first, there seems to be some problems with the code at well. The code looks
like it is written to take non-square 'data' arrays. However,


      for i in range(data.shape[0]):
          datasquared = (data - data[:,i])**2

This is looping over shape[0], but indexing on axis-1, which doesn't work
for non-square arrays. One suggestion is make a function to compute the
variogram along a given axis and then calling it twice instead of computing
them both independently. Can you try the following code and see if this
correctly implements a variogram? I don't have time to check that it really
implements a variogram, but I'm hoping it's close:

   def variogram(data, binsize, axis=-1):
       data = data.swapaxes(-1, axis)
       n = data.shape[-1]
       resultsize = int(N.ceil(n / float(binsize)))
       result = N.zeros([resultsize], data.dtype)
       for i in range(resultsize):
           j0 = max(i*binsize, 1)
           j1 = min(j0+binsize, n)
           denominator = 0
           for j in range(j0, j1):
               d2 = (data[...,j:] - data[...,:-j])**2
               result[i] += d2.sum()
               denominator += N.prod(d2.shape)
           result[i] /= denominator
       return result




On 6/22/07, Hanno Klemm <[EMAIL PROTECTED] > wrote:


Tim,

this is the best I could come up with until now:


import numpy as N

def naive_variogram(data, binsize=100., stepsize=5.):
    """calculates variograms along the rows and columns of the given
    array which is supposed to contain equally spaced data with
    stepsize stepsize"""

    # how many elements do fit in one bin?

    binlength = int(binsize/stepsize)

    #bins in x- and y- direction (+1 for the possible
    #elements larger than int(binsize/stepsize):
    x_bins = (data.shape[1])/binlength+1
    y_bins = (data.shape[0])/binlength+1

    #arrays to store the reuslts in
    x_result = N.zeros(x_bins, dtype = float)
    y_result = N.zeros(y_bins, dtype = float)

    #arrays to get teh denominators right
    x_denominators = N.zeros(x_bins, dtype=float)
    y_denominators = N.zeros(x_bins, dtype=float)

    #what is the last index?
    xlast = data.shape[1]
    ylast = data.shape[0]
    for i in range(data.shape[0]):
        datasquared = (data - data[:,i])**2
        #number of bins to fill until the end of the array:
        numbins = 1 + (xlast - i)/binlength
        for j in range(numbins):
            x_result[j]+=\
datasquared[:,i+1+j*binlength:i+1+(j+1)*binlength].sum()
            x_denominators[j] +=\
datasquared[:,i+1+j*binlength:i+1+(j+1)*binlength].size
        try:
            #Is there a rest?
            x_result[numbins] +=
datasquared[:,i+1+numbins*binlength:].sum()
            x_denominators[numbins] +=
datasquared[:,i+1+numbins*binlength:].size
        except IndexError:
            pass

    x_result /= x_denominators


    for i in range(data.shape[1]):
        datasquared = (data - data[i])**2
        #number of bins to fill until the end of the array:
        numbins = 1 + (ylast - i)/binlength
        #Fill the bins
        for j in range(numbins):

y_result[j]+=datasquared[i+1+j*binlength:i+1+(j+1)*binlength].sum()
            y_denominators[j] +=
datasquared[i+1+j*binlength:i+1+(j+1)*binlength].size
        try:
            #Is there a rest?
            y_result[numbins] +=
datasquared[:,i+1+numbins*binlength:].sum()
            y_denominators[numbins] +=
datasquared[:,i+1+numbins*binlength:].size
        except IndexError:
            pass

    y_result /= y_denominators

    return x_result, y_result

Thanks,
Hanno


Timothy Hochberg < [EMAIL PROTECTED]> said:

> ------=_Part_157389_1558912.1182523880067
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> On 6/22/07, Hanno Klemm <[EMAIL PROTECTED]> wrote:
> >
> >
> > Hi,
> >
> > I have an array which represents regularly spaced spatial data. I now
> > would like to compute the (semi-)variogram, i.e.
> >
> > gamma(h) = 1/N(h) \sum_{i,j\in N(h)} (z_i - z_j)**2,
> >
> > where h is the (approximate) spatial difference between the
> > measurements z_i, and z_j, and N(h) is the number of measurements with
> > distance h.
> >
> > However, I only want to calculate the thing along the rows and
> > columns. The naive approach involves two for loops and a lot of
> > searching, which becomes painfully slow on large data sets. Are there
> > better implementations around in numpy/scipy or does anyone have a
> > good idea of how to do that more efficient? I looked around a bit but
> > couldn't find anything.
>
>
> Can you send the naive code as well. Its often easier to see what's
going on
> with code in addition to the equations.
>
> Regards.
>
> -tim
>
>
>
> --
> .  __
> .   |-\
> .
> .  [EMAIL PROTECTED]
>
> ------=_Part_157389_1558912.1182523880067
> Content-Type: text/html; charset=ISO-8859-1
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> <br><br><div><span class="gmail_quote">On 6/22/07, <b
class="gmail_sendername">Hanno Klemm</b> &lt;<a
href="mailto:[EMAIL PROTECTED] ">[EMAIL PROTECTED] </a>&gt;
wrote:</span><blockquote class="gmail_quote" style="border-left: 1px
solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> <br>Hi,<br><br>I have an array which represents regularly spaced
spatial data. I now<br>would like to compute the (semi-)variogram,
i.e.<br><br>gamma(h) = 1/N(h) \sum_{i,j\in N(h)} (z_i -
z_j)**2,<br><br>where h is the (approximate) spatial difference
between the
> <br>measurements z_i, and z_j, and N(h) is the number of
measurements with<br>distance h.<br><br>However, I only want to
calculate the thing along the rows and<br>columns. The naive approach
involves two for loops and a lot of
> <br>searching, which becomes painfully slow on large data sets. Are
there<br>better implementations around in numpy/scipy or does anyone
have a<br>good idea of how to do that more efficient? I looked around
a bit but<br>couldn&#39;t find anything.
> </blockquote><div><br>Can you send the naive code as well. Its often
easier to see what&#39;s going on with code in addition to the
equations.<br><br>Regards.<br><br>-tim<br><br></div></div><br
clear="all"><br>-- <br>.&nbsp;&nbsp;__
> <br>.&nbsp;&nbsp; |-\<br>.<br>.&nbsp;&nbsp;<a
href="mailto:[EMAIL PROTECTED]"> [EMAIL PROTECTED]</a>
>
> ------=_Part_157389_1558912.1182523880067--
>



--
Hanno Klemm
[EMAIL PROTECTED]


_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion




--
.  __
.   |-\
.
.  [EMAIL PROTECTED]

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] effectively computing variograms with numpy

Reply via email to