Revision: 4032
http://matplotlib.svn.sourceforge.net/matplotlib/?rev=4032&view=rev
Author: jdh2358
Date: 2007-10-26 21:00:59 -0700 (Fri, 26 Oct 2007)
Log Message:
-----------
added a few more workbook examples
Modified Paths:
--------------
trunk/py4science/examples/skel/stats_descriptives_skel.py
trunk/py4science/examples/stats_descriptives.py
trunk/py4science/workbook/convolution.tex
trunk/py4science/workbook/intro_stats.tex
trunk/py4science/workbook/stats_descriptives.tex
trunk/py4science/workbook/stats_distributions.tex
Modified: trunk/py4science/examples/skel/stats_descriptives_skel.py
===================================================================
--- trunk/py4science/examples/skel/stats_descriptives_skel.py 2007-10-26
20:21:23 UTC (rev 4031)
+++ trunk/py4science/examples/skel/stats_descriptives_skel.py 2007-10-27
04:00:59 UTC (rev 4032)
@@ -56,14 +56,27 @@
maxlags : max number of lags for the autocorr
- detrend : a function used to detrend the data for the correlation
and spectral functions
+ detrend : a function used to detrend the data for the
+ correlation and spectral functions
fmt : the plot format string
"""
data = self.samples
- class Bunch: pass
- c = Bunch()
+ # Here we use a rather strange idiom: we create an empty do
+ # nothing class C and simply attach attributes to it for
+ # return value (which we carefully describe in the docstring).
+ # The alternative is either to return a tuple a,b,c,d or a
+ # dictionary {'a':someval, 'b':someotherval} but both of these
+ # methods have problems. If you return a tuple, and later
+ # want to return something new, you have to change all the
+ # code that calls this function. Dictionaries work fine, but
+ # I find the client code harder to use d['a'] vesus d.a. The
+ # final alternative, which is most suitable for production
+ # code, is to define a custom class to store (and pretty
+ # print) your return object
+ class C: pass
+ c = C()
N = 5
fig = c.fig = figfunc()
ax = c.ax1 = fig.add_subplot(N,1,1)
Modified: trunk/py4science/examples/stats_descriptives.py
===================================================================
--- trunk/py4science/examples/stats_descriptives.py 2007-10-26 20:21:23 UTC
(rev 4031)
+++ trunk/py4science/examples/stats_descriptives.py 2007-10-27 04:00:59 UTC
(rev 4032)
@@ -72,6 +72,18 @@
"""
data = self.samples
+ # Here we use a rather strange idiom: we create an empty do
+ # nothing class C and simply attach attributes to it for
+ # return value (which we carefully describe in the docstring).
+ # The alternative is either to return a tuple a,b,c,d or a
+ # dictionary {'a':someval, 'b':someotherval} but both of these
+ # methods have problems. If you return a tuple, and later
+ # want to return something new, you have to change all the
+ # code that calls this function. Dictionaries work fine, but
+ # I find the client code harder to use d['a'] vesus d.a. The
+ # final alternative, which is most suitable for production
+ # code, is to define a custom class to store (and pretty
+ # print) your return object
class C: pass
c = C()
N = 5
Modified: trunk/py4science/workbook/convolution.tex
===================================================================
--- trunk/py4science/workbook/convolution.tex 2007-10-26 20:21:23 UTC (rev
4031)
+++ trunk/py4science/workbook/convolution.tex 2007-10-27 04:00:59 UTC (rev
4032)
@@ -1,14 +1,61 @@
\section{Convolution}
\label{sec:convolution}
+The output of a linear system is given by the convolution of its
+impulse response function with the input. Mathematically
+\[
+ y(t) = \int_0^\t x(\tau)r(t-\tau)d\tau
+\]
+This fundamental relationship lies at the heart of linear systems
+analysis. It is used to model the dynamics of calcium buffers in
+neuronal synapses, where incoming action potentials are represented as
+Dirac $\delta$-functions and the calcium stores are represented with a
+response function with multiple exponential time constants. It is
+used in microscopy, in which the image distortions introduced by the
+lenses are \textit{deconvolved} out using a measured point spread
+function to provide a better picture of the true image input. It is
+essential in structural engineering to determine how materials respond
+to shocks.
+The impulse response function $r$ is the system response to a
+pulsatile input. For example, in Figure~\ref{fig:convolve_explain}
+below, the response function is the sum of two exponentials with
+different time constants and signs. This is a typical function used
+to model synaptic current following a neuronal action potential. The
+figure shows three $\delta$ inputs at different times and with
+different amplitudes. The corresponsing impulse response for each
+input is shown following it, and is color coded with the impulse input
+color. If the system response is linear, by definition, the response
+to a sum of inputs is the sum of the responses to the individual
+inputs, and the lower panel shows the sum of the responses, or
+equivalently, the convolution of the impulse response function with
+the input function.
+
\begin{center}%
\begin{figure}
\begin{centering}\includegraphics[width=4in]{fig/convolve_explain}\par\end{centering}
-\caption{\label{fig:convolve_explain}The output of a linear system to a series
of impulse inputs is equal to the sum of the scaled and time shifted impulse
response functions.}
+\caption{\label{fig:convolve_explain}The output of a linear system to
+a series of impulse inputs is equal to the sum of the scaled and time
+shifted impulse response functions.}
\end{figure}
\par\end{center}
+In Figure~\ref{fig:convolve_explain}, the summing of the impulse
+response function over the three inputs is conceptually and visually
+easy to understand. Some find the concept of a convolution of an
+impulse response function with a continuos time function, such as a
+sinusoid or a noise process, conceptually more difficult. It
+shouldn't be. By the \textit{sampling theorem}, we can represent any
+finite bandwidth continuous time signal as the sum of Dirac-$\delta$
+functions where the height of the $\delta$ function at each time point
+is simply the amplitude of the signal at that time point. The only
+requirement is that the sampling frequency be at least as high as the
+Nyquist frequency, defined as the highest spectral frequency in the
+signal divided by 2. See Figure~\ref{fig:convolve_deltas} for a
+representation of a delta function sampling of a damped, oscillatory,
+exponential function.
+
+
\begin{center}%
\begin{figure}
\begin{centering}\includegraphics[width=4in]{fig/convolve_deltas}\par\end{centering}
@@ -17,6 +64,29 @@
\par\end{center}
+In the exercise below, we will convolve a sample from the normal
+distribution (white noise) with a double exponential impulse response
+function. Such a function acts as a low pass filter, so the resultant
+output will look considerably smoother than the input. You can use
+\texttt{numpy.convolve} to perform the convolution numerically.
+
+We also explore the important relationship that a convolution in the
+tempoeral (or spatial) domain becomes a multiplication in the spectral
+domain, which is mathematically much easier to work with.
+\[
+Y = R*X
+\]
+
+where $Y$, $X$, and $R$ are the Fourier transforms of the respective
+variable in the temporal convolution equation above. The Fourier
+transform of the impulse response function serves as an amplitude
+weighting and phase shifting operator for each frequency component.
+Thus, we can get deeper insight into the effects of impulse response
+function $r$ by studying the amplitude and phase spectrum of its
+transform $R$. In the example below, however, we simply use the
+multiplication property to perform the same convolution in Fourier
+space to confirm the numerical result from \textt{numpy.convolve}.
+
\lstinputlisting[label=code:convolution_demo,caption={IGNORED}]{skel/convolution_demo_skel.py}
Modified: trunk/py4science/workbook/intro_stats.tex
===================================================================
--- trunk/py4science/workbook/intro_stats.tex 2007-10-26 20:21:23 UTC (rev
4031)
+++ trunk/py4science/workbook/intro_stats.tex 2007-10-27 04:00:59 UTC (rev
4032)
@@ -1 +1,31 @@
-TODO
\ No newline at end of file
+\textt{R}, a statistical package based on \textt{S}, is viewd by some
+as the best statistical software on the planet, and in the open source
+world it is the clear choice for sophisticated statistical analysis.
+Like python, \textt{R} is an interpreted language written in C with an
+interactive shell. Unlike python, which is a general purpose
+programming language, \textt{R} is a specialized statistical language.
+Since python is a excellent glue language, with facilities for
+providing a transparent interface to FORTRAN, C, C++ and other
+languages, it should come as no surprise that you can harness
+\textt{R}'s immense statistical power from python, through the
+\texttt{rpy} third part extension library.
+
+However, \textt{R} is not without its warts. As a language, it lacks
+python's elegance and advanced programming constructs and idioms. It
+is also GPL, which means you cannot distribute code based upon it
+unhindered: the code you distribute must be GPL as well (python, and
+the core scientific extension libraries, carry a more permissive
+license which support distribution in closed source, proprietary
+application).
+
+Fortunately, the core tools scientific libraries for python (primarily
+\textt{numpy} and \texttt{scipy.stats}) provide a wide array of
+statistical tools, from basic descriptive statistics (mean, variance,
+skew, kurtosis, correlation, \dots) to hypothesis testing (t-tests,
+$\Chi$-Square, analysis of variance, general linear models, \dots) to
+analytical and numerical tools for working with almost every discrete
+and continuous statistical distribution you can think of (normal,
+gamma, poisson, weibull, lognormal, levy stable, \dots).
+
+
+
Modified: trunk/py4science/workbook/stats_descriptives.tex
===================================================================
--- trunk/py4science/workbook/stats_descriptives.tex 2007-10-26 20:21:23 UTC
(rev 4031)
+++ trunk/py4science/workbook/stats_descriptives.tex 2007-10-27 04:00:59 UTC
(rev 4032)
@@ -1,7 +1,49 @@
\section{Descriptive statistics}
\label{sec:stats_descriptives}
+The first step in any statistical analysis should be to describe,
+charaterize and importantly, visualize your data. The normal
+distribution (aka Gaussian or bell curve) lies at the heart of much of
+formal statistical analysis, and normal distributions have the tidy
+property that they are completely characterized by their mean and
+variance. As you may have observed in your interactions with family
+and friends, most of the world is not normal, and many statistical
+analyses are flawed by summarizing data with just the mean and
+standard deviation (square root of variance) and associated
+signficance tests (eg the T-Test) as if it were normally distributed
+data.
+In the exercise below, we write a class to provide descriptive
+statistics of a data set passed into the constructor, with class
+methods to pretty print the results and to create a battery of
+standard plots which may show structure missing in a casual analysis.
+Many new programmers, or even experienced programmers used to a
+proceedural environment, are uncomfortable with the idea of classes,
+having hear their geekier programmer friends talk about them but not
+really sure what to do with them. There are many interesting things
+one can do with classes (aka object oriented programming) but at their
+hear they are a way of bundling data with methods that operate on that
+data. The \texttt{self} variable is special in python and is how the
+class refers to its own data and methods. Here is a toy example
+
+\begin{lstlisting}
+
+In [115]: class MyData:
+ .....: def __init__(self, x):
+ .....: self.x = x
+ .....: def sumsquare(self):
+ .....: return (self.x**2).sum()
+ .....:
+ .....:
+
+In [116]: nse = npy.random.rand(100)
+
+In [117]: mydata.sumsquare()
+Out[117]: 29.6851135284
+
+\end{lstlisting}
+
+
\lstinputlisting[label=code:stats_descriptives_skel,caption={IGNORED}]{skel/stats_descriptives_skel.py}
\begin{figure}
Modified: trunk/py4science/workbook/stats_distributions.tex
===================================================================
--- trunk/py4science/workbook/stats_distributions.tex 2007-10-26 20:21:23 UTC
(rev 4031)
+++ trunk/py4science/workbook/stats_distributions.tex 2007-10-27 04:00:59 UTC
(rev 4032)
@@ -1,9 +1,83 @@
\section{Statistical distributions}
\label{sec:stats_distributions}
+We explore a handful of the statistical distributions in
+\texttt{scipy.stats} module and the connections between them. The
+organization of the distribution functions in \texttt{scipy.stats} is
+quite elegant, with each distribution providing random variates
+(\texttt{rvs}), analytical moments (mean, variance, skew, kurtosis),
+analytic density (\texttt{pdf}, \texttt{cdf}) and survival functions
+(\texttt{sf}, \textt{isf}) (where available) and tools for fitting
+empirical distributions to the analytic distributions (\textt{fit}).
+in the exercise below, we will simulate a radioactive particle
+emitter, and look at the empirical distribution of waiting times
+compared with the expected analytical distributions. Our radioative
+particle emitter has an equal likelihood of emitting a particle in any
+equal time interval, and emits particles at a rate of 20~Hz. We will
+discretely sample time at a high frequency, and record a 1 of a
+particle is emitted and a 0 otherwise, and then look at the
+distribution of waiting times between emissions. The probability of a
+particle emission in one of our sample intervals (assumed to be very
+small compared to the average interval between emissions) is
+proportional to the rate and the sample interval $\Delta t$, ie
+$p(\Delta t) = \alpha \Delta t$ where $\alpha$ is the emission rate in
+particles per second.
+\begin{lstlisting}
+# a uniform distribution [0..1]
+In [62]: uniform = scipy.stats.uniform()
+
+# our sample interval in seconds
+In [63]: deltat = 0.001
+
+# the emission rate, 20Hz
+In [65]: alpha = 20
+
+# 1000 random numbers
+In [66]: rvs = uniform.rvs(1000)
+
+# a look at the 1st 20 random variates
+In [67]: rvs[:20]
+Out[67]:
+array([ 0.71167172, 0.01723161, 0.25849255, 0.00599207, 0.58656146,
+ 0.12765225, 0.17898621, 0.77724693, 0.18042977, 0.91935639,
+ 0.97659579, 0.59045477, 0.94730366, 0.00764026, 0.12153159,
+ 0.82286929, 0.18990484, 0.34608396, 0.63931108, 0.57199175])
+
+# we simulate an emission when the random number is less than
+# p(Delta t) = alpha * deltat
+In [84]: emit = rvs < (alpha * deltat)
+
+
+# there were 3 emissions in the first 20 observations
+In [85]: emit[:20]
+Out[85]:
+array([False, True, False, True, False, False, False, False, False,
+ False, False, False, False, True, False, False, False, False,
+ False, False], dtype=bool)
+\end{lstlisting}
+
+The waiting times between the emissions should follow an exponential
+distribution (see \texttt{scipy.stats.expon}) with a mean of
+$1/\alpha$. In the exercise below, you will generate a long array of
+emissions, compute the waiting times between emissions, between 2
+emissions, and between 10 emissions. These should approach an 1st
+order gamma (aka exponential) distribution, 2nd order gamma, and 10th
+order gamma (see \texttt{scipy.stats.gamma}). Use the probability
+density functions for these distributions in \texttt{scipy.stats} to
+compare your simulated distributions and moments with the analytic
+versions provided by \texttt{scipy.stats}. With 10 waiting times, we
+should be approaching a normal distribution since we are summing 10
+waiting times and under the central limit theorem the sum of
+independent samples from a finite variance process approaches the
+normal distribution (see \texttt{scipy.stats.norm}). In the final
+part of the exercise below, you will be asked to approximate the 10th
+order gamma distribution with a normal distribution. The results
+should look something like those in
+Figure~\ref{fig:stats_distributions}.
+
\lstinputlisting[label=code:stats_distributions_skel,caption={IGNORED}]{skel/stats_distributions_skel.py}
\begin{figure}
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Matplotlib-checkins mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-checkins