[Edu-sig] r

michel paul Thu, 09 Oct 2008 08:00:30 -0700

Here's more regarding student notes = concept outlines that RUN.

In dealing with sample vs population variance, in both cases we are talking
about an average, but our traditional notation tends to obscure that fact.
It's easy to see that population variance is the mean of the squared
deviations, but the sample formula tends to blur that concept.  The result
at a high school level has been to appreciate the resulting messy formula
but to then thankfully turn to some kind of magic black box that will
perform those calculations for you.


Well, I think that's dumb.  Instead of using software packages or other
kinds of black boxes to magically generate results,
let's use our our own thoughts to generate definitions.  This can be done as
a class discussion.

I first needed a term for finding the sum(L)/(n - 1).  That certainly is a
kind of mean, but there doesn't seem to be an official term for it.  So, I
decided to call it an 'adjusted' mean.  If there is a better term for this,
please let me know.  For the time being, I can say that variance is ALWAYS
the mean of the squared deviations.  It's just that if you're dealing with a
sample, you find the 'adjusted' mean of the squared deviations.

So now we have a nice suite of mostly one-liner functions that handle things
we've studied up through the Pearson correlation coefficient.

In the following, 'sample' is a global boolean variable.  The concepts are
most easily expressed in population form, but most frequently applied using
sample form.  This puts the two together.

Caveat:  this is not meant to be definitive code.  Not at all.  It is only
meant to be code that illustrates concepts.  Feedback welcomed.  As I said
to my dept chair, certainly there are many software packages that already
exist that will find these things for you, but could the code behind them
serve as a math student's notes???  No way!

- Michel

=======================================

sample = True

def mean(L): return sum(L)/len(L)

def adjusted_mean(L): return sum(L)/(len(L) - 1)

def deviations(L): return [x - mean(L) for x in L]

def squares(L): return [x**2 for x in L]

def variance(L):
    if sample: return adjusted_mean(squares(deviations(L)))
    else: return mean(squares(deviations(L)))

def stdev(L): return sqrt(variance(L))

def zscores(L): return [deviation/stdev(L) for deviation in deviations(L)]

def X(L): return [x for (x, y) in L]
def Y(L): return [y for (x, y) in L]

def r(L):
    if sample: return adjusted_mean([zx*zy for (zx, zy) in
zip(zscores(X(L)), zscores(Y(L)))])
    else: return mean([zx*zy for (zx, zy) in zip(zscores(X(L)),
zscores(Y(L)))])

_______________________________________________
Edu-sig mailing list
Edu-sig@python.org
http://mail.python.org/mailman/listinfo/edu-sig

[Edu-sig] r

Reply via email to