Hi Kohei,
Kohei Yoshida wrote:
Embedding of R into Calc is legally impossible as R is released under
GPL. However, we could dynamically load it at run time without
violating the license term (but again, IANAL).
Well, with embedding I mean load at runtime the necessary resources (as
discussed on the stat wiki page). ;-)
Do you have a reference for this code i.e. is it entirely your own
code, or is it derived from another source (application, book, paper,
etc.)?
The code is a direct port by *ME* of the ANOVA mathematical definition:
i.e. it is this definition that is written in all books (and articles) I
have read about ANOVA.
HOWEVER: almost all books give then another "practical" formula, which
is to calculate the square of residuals using a difference of 2 other
terms. *These formulas* (containing a difference) are usually *unstable*
(see my previous discussion, and the current implementation of
correlation and some other Calc functions to see the adverse effects of
a subtraction), in that they can generate impossible values (like
negative F statistic).
So, this is essentially my port of the formula. Of course, I took a look
at the correlation code (because I had previously NO idea how to access
the data in Calc). My other reference was (a very good one):
http://courses.ncssm.edu/math/Stat_Inst/PDFS/NEWANOVA.pdf (see also the
stat wiki page, ANOVA section,
http://wiki.services.openoffice.org/wiki/Statistical_Data_Analysis_Tool#Multiple-Groups_Inference).
However, that document describes the matrix approach to ANOVA. Calc does
NOT allow matrix calculations, so I transposed the ideas back to simple
calculations. [Currently I have implemented only the one-way non-blocked
type ANOVA. When the code is complete and functional, I will make the
changes for the various flavours of ANOVA, but it makes NO sense now, to
have multiple code segments to update.]
One LAST COMMENT:
The code outputs the *F statistic*, NOT the *p Value*.
To obtain the p-value, a call to FDIST('F statistic value', dfB, dfE)
must be made, BUT:
- I did NOT figure out yet where this function is and how it's named
- all statistic software outputs both the F statistic AND the p-value
[and a lot of other data - this is actually meaningful]
So, in the longer term we must also think about more expanded output,
because Calc is really limited here. And all modern statistical
functions and techniques DO OUTPUT a lot of data, not just a p-value.
Hope this clarifies some issues.
Sincerely,
Leonard Mada
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]