[sc-dev] Re: What should ZTEST calculate?

Leonard Mada Thu, 30 Oct 2008 14:21:53 -0700

Hi Regina,

I believe this discussion fits better the sc-mailing list.

I will describe below, how I would handle this issue. I would not dwelldeep into the current z-implementation. The z-test is very limited. So,I would actually want to extend the t-test to cover one group of data.

Calc fares currently bad, as it offers a semi-robust test only for 2groups of data. So, I would redirect all efforts to extend Calc'scapabilities both to 1 group of data and to >2 groups of data.


Step 1:
Extend the t-test to accept a single group of data.
=TTEST( <range> , <number> , tails = 2 , variance = NULL )

Compares the mean of the data group <range> to the value <number>, using<x>-tails, and assuming the variance is equal to the variance of <range>.

tails = 2: 2-tailed (alternative: "two.tailed")
tails = 1: one tailed less (alt: "less")
tails = 3: one tailed greater (alt: "greater")

<optional parameter>
variance = NULL: use the variance of <range>
variance = <number>, use this variance instead

Step 2:

Implement ANOVA to cover 2 or more groups of data. I posted some c++code to issue 4921, see

http://www.openoffice.org/issues/show_bug.cgi?id=4921

[It implements only the simple one-way ANOVA, skipping the block-design.]
=ANOVA( <range> , design = 1)
# EVERY column = one set of data

=ANOVA( <range1> , <range2> , ... , design = 1 )
# EVERY RANGE = one set of data

design = 1: one way ANOVA
design = 2: two way ANOVA (factorial block design)
design = 3: two way ANOVA (randomized block design)

See also http://www.statmethods.net/stats/anova.html

Hi Leonard,

Leonard Mada schrieb:
[...]

I have found one discussion in
http://lists.oasis-open.org/archives/office-formula/200702/msg00047.html
and Eike reminds on it in
http://lists.oasis-open.org/archives/office-formula/200806/msg00050.html

But the spec has still a red ToDo in that place.


See below.

> The z-test is a simplified t-test. So, for groups larger than 30values,

> it should be quite close to the t-test.
>

> The first thing to strike you is the fact that you can't use in Calcthe

> z-test or the t-test simultanously. This is because, in Calc (I don't
> know of Excel), the t-test works ONLY on 2 groups of data, while the
> z-test works on a SINGLE group of data. This is a design flaw in the
> statistics engine.

I do not see any attempt to change that, not even an issue.
>
> BOTH tests should work both on a single group of data, and on 2 groups
> of data (while the ANOVA works on 2 or more groups of data). This is a
> MAJOR shortcoming of Calc. You can't use a somewhat more robust test
> (t-test) to compare a single group of data against a reference value.
>
> For less than 30 values, the t-test is preferred, and actually is the
> only test in R (there is a special package that has the z-test
> implemented for teaching purposes, I forgot the name but Google will
> probably get it).
>
> You can use more than 30 values and compute the t-test in R. It should
> yield the same results as the z-test, e.g.:
> x<-rnorm(30)
> t.test(x, mu = 0.5)

I don't have R. I have only got Excel and Gnumeric.

R is open source. Google for "R", or go to http://cran.R-project.org,and you can get R. It runs under almost every platform (support forWin9x was dropped in the latest R, but I can confirm that it runs onWin2k). Be warned, the learning curve is steep.


Basics:
Creating a vector:
x<- c( <number1>, <number2> , ... )

30 random numbers:
x<- rnorm(30)

t.test:
t.test( <vector1> , <vector2> ) # two.sided
t.test( <vector1> , <vector2> , "less" ) # one sided, less
t.test( <vector1> , <vector2> , "greater" ) # one sided, greater
t.test ( <vector> , mu = <number> ) # one group of data
# don't forget to write the string 'mu='
# "less" and "greater" apply similarly

There is also a z-test available in package 'TeachingDemos' (you need todownload first this package), see:

http://rss.acs.unt.edu/Rdoc/library/TeachingDemos/html/z.test.html

Sincerely,

Leonard

>
> In this instance, we compare the mean of the sample x against another
> mean mu = 0.5 (don't forget the 'mu', otherwise you get an error).
>
> If the z-test in Calc gives a different result, then it is wrong.

It would be nice to get a test spreadsheet with dummy data and the
results which R returns.

  As

> with t-test, z-test can be one-sided or 2-sided, but the standardshould

> be 2-sided.

In the spec it is now 2-sided.
>
> I hope this helps.

Not really. When we will implement ZTest in the 2-sided way, as it is
now defined in the spec, than it would differ from the current behavior.
Therefore going to ODF1.2 there will be a new ZTEST which gives other
results than the old one. How should Calc handle this? Or should we try
to get OASIS to define a 1-sided way? But even than it would be
different from now, because the 1-sided way is not correct implemented
in Excel and Calc; at least I understand the comments on the mailing
list in that way.

kind regards
Regina



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[sc-dev] Re: What should ZTEST calculate?

Reply via email to