On Wed, Apr 13, 2011 at 9:50 AM, Jonathan Rocher <[email protected]> wrote: > Hi, > > I assume you have this data in a txt file, correct? You can load up all of > it in a numpy array using > import numpy as np > data = np.loadtxt("climat_file.txt", skiprows = 1) > > Then you can compute the mean you want by taking it on a slice of the data > array. For example, if you want to compute the mean of your data in Jan for > 1950-1970 (say including 1970) > mean1950_1970 = data[1950:1971,1].mean() > > Then the std deviation you want could be computed using > my_std = np.sqrt(np.mean((data[:,1]-mean1950_1970)**2)) > > Hope this helps, > Jonathan > > On Tue, Apr 12, 2011 at 1:48 PM, Climate Research <[email protected]> > wrote: >> >> Hi >> I am purely new to python and numpy.. I am using python for doing >> statistical calculations to Climate data.. >> >> I have a data set in the following format.. >> >> Year Jan feb Mar Apr................. Dec >> 1900 1000 1001 , , , >> 1901 1011 1012 , , , >> 1902 1009 1007 , , >> ,,,, , ' , , , >> ,,,, , , >> 2010 1008 1002 , , , >> >> I actually want to standardize each of these values with corresponding >> standard deviations for each monthly data column.. >> I have found out the standard deviations for each column.. but now i need >> to find the standared deviation only for a prescribed mean value >> ie, when i am finding the standared deviation for the January data >> column.. the mean should be calculated only for the january data, say from >> 1950-1970. With this mean i want to calculate the SD for entire column. >> Any help will be appreciated.. >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > Jonathan Rocher, PhD > Scientific software developer > Enthought, Inc. > [email protected] > 1-512-536-1057 > http://www.enthought.com > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
To standardize the data over each column you'll want to do: (data - data.mean(axis=0)) / data.std(axis=0, ddof=1) Note the broadcasting behavior of the (matrix - vector) operation--see NumPy documentation for more details. The ddof=1 is there to give you the (unbiased) sample standard deviation. <shameless plug> If you're looking for data structures to carry around your metadata (dates and month labels), look to pandas (my project: http://pandas.sourceforge.net/) or larry (http://larry.sourceforge.net/). </shameless plug> - Wes _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
