# Re: Disadvantage of Non-parametric vs. Parametric Test

```Glenn Barnett wrote:
>
> But since WMW is completely insensitive to a change in spread without
> a change in location, if either were possible, a rejection would
> imply that there was indeed a location difference of some kind. This
> objection strikes me as strange indeed. Does Johnson not understand
> what WMW is doing? Why on earth does he think that a t-test suffers
> any less from these problems than WMW?
>
> Similarly, a change in shape sufficient to get a rejection of a WMW
> test would imply a change in location (in the sense that the "middle"
> had moved, though the term 'location' becomes somewhat harder to pin
> down precisely in this case).  e.g. (use a monospaced font to see this):
>
> :.                 .:
> ::.       =>      .::
> ::::...       ...::::
> a     b       a     b
>
> would imply a different 'location' in some sense, which WMW will
> pick up. I don't understand the problem - a t-test will also reject
> in this case; it suffers from this drawback as well (i.e. they are
> *both* tests that are sensitive to location differences, insensitive
> to spread differences without a corresponding location change, and
> both pick up a shape change that moves the "middle" of the data).

In fact, it can be shown (I can send details - and a preprint- to
anybody interested) that a weakness - at least in principle - of the WMW
test is that it *fails* to be a test of location, in that it may exhibit
cyclicity between three sets of data, or even consistently cyclic behaviour
between three populations as sample size -> infinity.

(A test is "cyclic" if it can imply A > B > C > A, rejecting the null
hypothesis in each case. This is stronger than "intransitivity" in which the
test implies A>B>C but fails to reject A=C.  Student's t test (with pooled
variance)  can exhibit the latter behaviour  (suppose n1 = n3 = 2, n2 = 100;
xbar1 = -2, xbar2 = 0, xbar3 = 2; and s1 =s2 =s3 = 1). but not the former,
as it can never imply mu1 > mu2 if xbar1 <= xbar2.)

The simplest example of cyclic behaviour for the WMW test uses made-up
(or large) data sets based on Efron's intransitive dice, labelled
{1,1,5,5,5,5},{3,3,3,4,4,4} and {2,2,2,2,6,6}. Details are left to the
reader.  This is Barnett's "change of shape".

Pothoff (1963) showed that WMW is a test for the median/mean between any
two symmetric distributions; and it is clear that it is a test for the
median/mean within any shifted family.

However, (Dawson, 1997, unpublished) for a Behrens-Fisher family of
asymmetric distributions, cyclic behaviour is typically exhibited; so that a
change of shape is *not* necessary.  In particular, if f_X(x) is analytic
with all moments existing, the WMW test is a test of location for the
Behrens-Fisher family generated by f_X(x) if and only if f_X(x) is
symmetric.  For more general distributions, a necessary and sufficient
condition is that if we let f_X(x) = f1(x) + f2(x) where f1 is nonzero only
below the median (WLOG 0) and f2 only above, and gi(x) = e^x fi(e^x), then
g1 and g2 have the same autocorrelation. (Don't ask me why, I just did the
calculus & that's what it said...)

Notwithstanding all of the above, the cyclicity phenomenon is never very
strong. Using a result of Steinhaus and Trybula (1959), we can show that
even three made-up data sets cannot exhibit cyclicity for two-tailed WMW
tests at the 5% significance level unless each sample size is at least 50.
EG:

Sample    1    2     3

X=1           19   0     0
2           0    0     31
3           0    50    0
4           31   0     0
5           0    0    19

but no smaller sample size will work.  Using random samples from populations
divided in these proportions we would of course need samle sizes larger than
50 to have this happen with any great frequency.

As a final example, consider the shifted exponential distributions as a
fairly realistic model of a Behrens-Fisher family. It can be shown that, for
random samples from three member distributions f_a, f_b, f_c chosen so that
the expected values of the pairwise WMW test statistics imply A>B>C>A for
hypothetical "locations" A,B,C,  at least one test will have a power of less
than 50% (for two-sided 5% significance level tests) unless the sample sizes
are greater than about 800. (As n -> infinity, the power of all three tests
goes to 1, of course; but it takes its time doing so!)

Thus, while the phenomenon is in one sense very widespread, it would
seem that there are few naturally occurring triples of independent data sets
for which the WMW is cyclic; and examples for which the Behrens-Fisher model
is plausible may be very few and far between.

-Robert Dawson

```