On Sat, Sep 11, 2010 at 7:45 AM, Massimo Di Stefano
<[email protected]> wrote:
> Hello All,
>
> i need to extract data from an array, that are inside a
> rectangle area defined as :
>
> N, S, E, W = 234560.94503118, 234482.56929822, 921336.53116178, 921185.3779625
>
> the data are in a csv (comma delimited text file, with 3 columns X,Y,Z)
>
> #X,Y,Z
> 3020081.5500,769999.3100,0.0300
> 3020086.2000,769991.6500,0.4600
> 3020099.6600,769996.2700,0.9000
> ...
> ...
>
> i read it using " numpy.loadtxt "
>
> data :
>
> http://www.geofemengineering.it/data/csv.txt     5,3 mb (158735 rows)
>
> to extract data that are inside the boundy-box area (N, S, E, W) i'm using a 
> loop
> inside a function like :
>
> import numpy as np
>
> def getMinMaxBB(data, N, S, E, W):
>        mydata = data * 0.3048006096012
>        for i in range(len(mydata)):
>                if mydata[i,0] < E or mydata[i,0] > W or mydata[i,1] < N or 
> mydata[i,1] > S :
>                        if i == 0:
>                                newdata = 
> np.array((mydata[i,0],mydata[i,1],mydata[i,2]), float)
>                        else :
>                                newdata = np.vstack((newdata,(mydata[i,0], 
> mydata[i,1], mydata[i,2])))
>        results = {}
>        results['Max_Z'] = newdata.max(0)[2]
>        results['Min_Z'] = newdata.min(0)[2]
>        results['Num_P'] = len(newdata)
>        return results
>
>
> N, S, E, W = 234560.94503118, 234482.56929822, 921336.53116178, 921185.3779625
> data = '/Users/sasha/csv.txt'
> mydata = np.loadtxt(data, comments='#', delimiter=',')
> out = getMinMaxBB(mydata, N, S, E, W)
>
> print out

Use boolean arrays to index the parts of your array that you want to look at:

def newGetMinMax(data, N, S, E, W):
        mydata = data * 0.3048006096012
        mask = np.zeros(mydata.shape[0], dtype=bool)
        mask |= mydata[:,0] < E
        mask |= mydata[:,0] > W
        mask |= mydata[:,1] < N
        mask |= mydata[:,1] > S
        results = {}
        results['Max_Z'] = mydata[mask,2].max()
        results['Min_Z'] = mydata[mask,2].min()
        results['Num_P'] = mask.sum()
        return results

This runs about 5000 times faster on my machine.

Brett
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to