Hello all,

The short version: For a given NxN array, is there an efficient way to use a 
moving window to collect a summary statistic on a chunk of the array, and 
insert it into another array?

The long version: I am trying to resample an image loaded with GDAL into an NxN 
array. Note that this is for statistical purposes, so image quality doesn't 
matter. For the curious, the image is derived from satellite imagery and 
displays a map of hotspots of tropical deforestation at 500m resolution. I need 
to assign a count of these deforestation 'hits' to each pixel in an existing 
array of 1000m pixels.


Let's say the image looks like this: np.random.randint(0,2, 16).reshape(4,4)

array([[0, 0, 0, 1],
       [0, 0, 1, 1],
       [1, 1, 0, 0],
       [0, 0, 0, 0]])

I want to use a square, non-overlapping moving window for resampling, so that I 
get a count of all the 1's in each 2x2 window.

0, 0,   0, 1
0, 0,   1, 1                 0  3
                    =>       2  0
1, 1,   0, 0                    
0, 0,   0, 0

In another situation with similar data I'll need the average, or the maximum 
value, etc..

My brute-force method is to loop through the rows and columns to get little 
chunks of data to process. But must be a more elegant way to do this - it's 
probably obvious too ... (inelegant way further down).

Another way to do it would be to use np.tile to create an array called "arr" 
filed with blocks of [[0,1],[2,3]]. I could then use something like this to get 
the stats I want:

d0 = img[np.where(arr==0)]
d1 = img[np.where(arr==1)]
d2 = img[np.where(arr==2)]
d3 = img[np.where(arr==3)]

img_out = d0 + d1 + d2 + d3

This would be quite snappy if I could create arr efficiently. Unfortunately 
np.tile does something akin to np.hstack to create this array, so it isn't 
square and can't be reshaped appropriately (np.tile(np.arange(2**2).reshape(2, 
2), 4)):

array([[0, 1, 0, 1, 0, 1, 0, 1],
       [2, 3, 2, 3, 2, 3, 2, 3]])

Inefficient sample code below. Advice greatly appreciated!

-Robin


import numpy as np
from math import sqrt
from time import time

def rs(img_size=16, window_size=2):
    w = window_size
    
    # making sure the numbers work

    if img_size % sqrt(img_size) > 0:
        print "Square root of image size must be an integer."
        print "Sqrt =", sqrt(img_size)
        
        return
        
    elif (img_size / sqrt(img_size)) % w > 0:
        print "Square root of image size must be evenly divisible by", w
        print "Sqrt =", sqrt(img_size)
        print sqrt(img_size), "/", w, "=", sqrt(img_size)/w
        
        return
    
    else:
        
        rows = int(sqrt(img_size))
        cols = int(sqrt(img_size))
    
        # create fake data: ones and zeros
        img = np.random.randint(0, 2, img_size).reshape(rows, cols)
        
        # create empty array for resampled data
        img_out = np.zeros((rows/w, cols/w), dtype=np.int)
        
        t = time()

        # retreive blocks of pixels in img
        # insert summary into img_out
        
        for r in xrange(0, rows, w):     # skip rows based on window size
            for c in xrange(0, cols, w): # skip columns based on window size
                
                # calculate the sum of the values in moving window,
                #insert value into corresponding pixel in img_out
                
                img_out[r/w, c/w] = np.int(np.sum(img[r:r+w, c:c+w]))
    
        t1= time()
        elapsed = t1-t
        print "img shape:", img.shape
        print img, "\n"
        print "img_out shape:", img_out.shape
        print img_out
        
        print "%.7f seconds" % elapsed

rs(imgage_size=16, window=2)
rs(81, 3)
rs(1000000)
#rs(4800*4800) # very slow
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to