Don Taylor wrote:
> 
> Gottfried Helms <[EMAIL PROTECTED]> writes:
> ...
> >BC's basic examples were always such with two uniform factors. One was assumed
> >to be measured exactly (in X1), and the other by a composite of the both (Y1).
> ...
> >But - that is only useful, if you have reason to assume, that your population
> >factors are both unifom, uncorrelated; that your measured items are relatively
> >error free and not too many factors are involved. If I recall it right from my
> >fiddling with this last year, with composites of only 4 or more factors the
> >exploitable differences between the distributional properties are too small
> >to get good results. (but may be I don't recall it right, currently).
> 
> I have snipped a great deal out of your posting, to focus on one point.
> 
> Based on some primitive work I did early in the year, for sufficient
> sample size and with uncorrelated variables, the effect was present for
> some distributions other than uniform.
> 
> Someone else later made one posting on this subject, where he said he
> had made some progress on this and had a partial result determining what
> distributions could be used in linear combinations of RV and still be
> able to determine which was the independent RV and which was the linear
> combination.  I remember he credited BC at the end of his posting.
> But, I don't remember who made that posting.

Well, I remember that, too, -vaguely. I tried to reduce my explanation to the basic 
case, to keep the focus at the principle. The more "normal" the involved distributions
are, the less CR can distinguish. 

The basic idea of CR was, to compare the distribution of the independent variable
(uniform distribution) to the dependent/mixture variable (which includes common 
variance with the independent variable). 

With a scatterplot it can be shown, what is been done here: 

two uniform variables X1, X2  

           !       
     * * * ! * * * 
     * * * ! * * * 
     * * * ! * * * 
-----*-*-*-!-*-*-*---- 
     * * * ! * * * 
     * * * ! * * * 
     * * * ! * * * 
           !

two composites (Y1 = 0.707(x1+x2) Y2 = 0.707(x2-x1)

           *       
         * ! *     
       * * ! * *   
     * * * ! * * * 
---*-*-*-*-!-*-*-*-*--- 
     * * * ! * * * 
       * * ! * *  
         * ! *   
           !

Both scatterplots show statistical non-correlation.

Now Cr uses the absolute values (I recommend to use the squares to
keep things accessible for common variance analysis)
The square scatterplot of X1/X2 maps then again to a square (in the
first quadrant) and the scatterplot of Y1/Y2 maps to a lower triangle
(in the first quadrant, too). 


           !       
           ! * * * 
           ! * * * 
           ! * * * 
-----------!-*-*-*---- 
           ! 

           *       
           ! *   
           ! * *  
           ! * * * 
-----------!-*-*-*-*-- 
           ! 


Computing the correlation of the new scatterplots give R1 = 0 again for
(|X1|;|X2|) but a negative value R2 for (|Y1|;|Y2|). 
The CR-coefficient is just the difference of this correlations R1-R2.

-----------------

This effect is clearly not limited to exact uniform distributions as
a source, but with both variables being normal neither the scatterplots 
of the original variables nor that of the absolute-value variables will differ-
thus CR cannot find any difference with normal distributed sources.


In my opinion the best way to see what's going on with different distributions
is derived from calculus using squared values instead of absolute values.
It can be derived, that the behave of the scatterplots is near that, what
happens, if I use the quartimax-rotation-criterion in factor-analysis:
you can imagine, that the angle between each point and the x-axis is
multiplied by 4.

This transformation creates for the two scatterplots two figures like
a laying waterdrop: for the X1/X2 with the tip at negative x and for Y1/Y2
with the tip at positive x.

(for X1/X2)
            !      
            !      
        * * ! * *   
    * * * * ! * ** 
*-*-*-*-*-*-!-*-**------- 
    * * * * ! * ** 
        * * ! * *  
            !     
            !

opposite form for Y1/Y2. This reflects, that the 4 45-deg-axes are mapped
to the negative x-axis and the four 90-deg-axes are mapped to the positive
x-axis. If the x-values and y-vales of the coordinates are summed, then that 
will give a certain coordinate in (x-sum,y-sum), where for both pairs the
y-sum is zero and the x-sum is -1 for the pair X1/X2 and positive for the
pair Y1/Y2. If you know the mathematics of factor-analysis, then you see,
that this is exactly what happens in the quartimax-rotation. 

Take these values (negatively) for a rotation angle, divide this by four
and rotate the original scatterplot. If the original scatterplot was a
square, then the (cos,sin) is (1,0), thus no rotation, if the original
scatterplot was a dimaond, then the (cos,sin) is (0.707,0.707)=(45 deg) 
and the diamond will be rotated to the square-position. 

--------

For what may be this instructive? It indicates, that this method evaluates
the joint distribution in that regard, that the points at the 45-deg-axes
are counted most positively and the points at the 90-deg-axes are counted
most negatively. 
So the highest CR-value should give a configuration, where the scatter-
plot of the pair (X1,X2) is like an "x" , and that of the pair (Y1,Y2) conse-
quently is a "+". 
Then, for this example, the absolute-values-mapping of CR generates a straight
45-deg-line for the pair (|X1|,|X2|), with R1 = 1, and the upper-right edge of
a plus-sign for the pair (|Y1|,|Y2|), which gives a correlation R2 of any high 
negative value, thus the coefficient R1-R2 should be the maximum for such an
synthetic distribution.

Now if you have any empirical scatterplot of the X1/x2-variables and the
Y1/y2-variables you can guess the result of CR by inspection of the
points according to the 45 and 90 deg-axes - or just the mirroring of
the quadrants (2,3,4) to the first one and compare the resulting
correlations.

--------

I admit that my version is only a rough estimation, since the CR-Mapping of
the original scatterplot to that of the absokute values in the first
quadrant is not a true multiplication of their angles by 4. That should
be the reason of the occasional differences between the results of the
original version of CR (using absolute values) and my modification
using squares or even the "rotated" version. But this version may possibly
be another, at least analytically more accessible, way to account for 
bivariate uniformity (square-form) in comparision to its mixtures, which
was the basic idea of CR.

Gottfried Helms
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to