Hence my recommendation to use cross cross validation

> Hi
> Sorry to repeat myself - but the samples are not
> independent.  Independance is a fundamental
> assumption of these types of tests - and you cannot
> interpret the tests if this assumption is violated. 
> In the situation where spatial correlation exists,
> the true standard error is nothing like as small as
> the (s/sqrt(n)) that Chaosheng discusses - because
> the sqrt(n) depends on independence.
> Again, as I said before, if the data has any type of
> trend in it, then it is completely meaningless to
> try and use these tests - and with no trend but some
> 'ordinary' correlation, you must find a means of
> taking the data redundancy into account or risk get
> hopelessly pessimistic results (in the sense of
> rejecting the null hypothesis of equal means far too
> often)
> Consider a trivial example. A one dimensional random
> function which takes constant values over intervals
> of lenght one - so, it takes the value a_0 in the
> interval [0,1[  then the value a_1 in the interval
> [1,2[ and so on (let us suppose that each a_n term
> is drawn at random from a gaussian distribution with
> the same mean and variance for example).  Next
> suppose you are given samples on the interval [0,2].
> You spot that there seems to be a jump between [0,1[
> and [1,2[  - so you test for the difference in the
> means. If you apply an f test you will easily find
> that the mean differs (and more convincingly the
> more samples you have drawn!). However by
> construction of the random function,  the mean is
> not different.  We have been lulled into the false
> conclusion of differing means by assuming that all
> our data are independent.
> Regards
> Colin Daly
> Dear all,
> I'm wondering if sample size (number of samples, n)
> is playing a role here.
> Since Colin is using Excel to analyse several
> thousand samples, I have checked the functions of
> t-tests in Excel. In the Data Analysis Tools help, a
> function is provided for "t-Test: Two-Sample
> Assuming Unequal Variances analysis". This function
> is the same as those from many text books (There are
> other forms of the function). Unfortunately, I
> cannot find the function for "assuming equal
> variances" in Excel, but I assume they are similar,
> and should be the same as those from some text
> books.
> From the function, you can find that when the sample
> size is large you always get a large t value. When
> sample size is large enough, even slight differences
> between the mean values of two data sets (x bar and
> y bar) can be detected, and this will result in
> rejection of the null hypothesis. This is in fact
> quite reasonable. When the sample size is large, you
> are confident with the mean values (Central Limit
> Theorem), with a very small stand error
> (s/(sqrt(n)). Therefore, you are confident to detect
> the differences between the two data sets. Even
> though there is only a slight difference, you can
> still say, yes, they are "significantly" different.
> If you still remember some time ago, we had a
> discussion on large sample size problem for tests
> for normality. When the sample size is large enough,
> the result can always be expected (for real data
> sets), that is, rejection of the null hypothesis.
> Cheers,
> Chaosheng
> > Don
> >
> > Thank you for the extended clarification of F and
> t
> > hypothesis test. For those unfamiliar with the
> > concept, it is worth noting that the F test for
> > multiple means may be more familiar under the
> title
> > "Analysis of variance".
> >
> > My own brief answer was in the context of Colin's
> > question, where it was quite clear that he was
> talking
> > aboutthe simplest F variance-ratio and t
> comparison of
> > means test.
> >
> > Isobel
> >
> >
> > * By using the ai-geostats mailing list you agree
> to 
