James Diamond wrote:
> 
> David:  i was also under the impression that X and Y had to be in the same
> units for orthogonal regression to make sense.  is this true?  it's not
> intuitive to me why this would be the case.

        It certainly helps if the units are the same, but it is neither
necessary nor sufficient for them to be the same.

        Orthogonal regression makes the assumption that minimizing the
orthogonal offsets is a Good Thing. This can be for many reasons
(including sheer aesthetic satisfaction) but the *usual* reason is that
there is a model 

                X_measured = X + E_x 
                Y_measured = mX+b + E_y 
                E_x, E_y normal with mean 0
                     V(E_x)=V(E_y)

underlying the data. The likelihood of a line given a specified datum
is  proportional to the square of the orthogonal offset to it; so 
orthogonal regression gives a maximum-likelihood estimate of the
parameters of the line y=mx+b.
        
        (There are many methods of computing the orthogonal regression line; I
believe that *some* of them make (and use) the additional assumption
that $X$ is normally distributed too, making the joint distribution
bivariate normal. However, the basic orthogonal regression task is
well-defined without this assumption, so methods that do not use it
should probably be preferred.)

        This is NOT the line we want if we intend to predict measured Y from
measured X. In that case, no matter *which* variable has the error, the
OLS regression of Y on X is correct. Similarly, the OLS regression of X
on Y predicts measured X from measured Y. for *all* assumptions about
whose fault the error is. The OR line predicts true Y from true X or
vice versa, if the error is equally distributed.

        If we want to use the model


                X_measured = X + E_x 
                Y_measured = mX+b + E_y 
                E_x, E_y normal with mean 0
                V(E_x)= k^2 V(E_y), k given

we can do so by scaling all the Y data by the factor k, performing OR,
and then unscaling. However, the more general model 

                X_measured = X + E_x 
                Y_measured = mX+b + E_y 
                E_x, E_y normal with mean 0
                V(E_x)= k^2 V(E_y), k unknown 

cannot always be fitted unambiguously. For instance, if the joint
distribution is bivariate normal, the goodness-of-fit is described by
the single parameter rho^2, and this error can be partitioned between X
and Y in any ratio, yielding different fit lines in each case. 

        In some circumstances - for instance, if X is discrete with
well-separated values, or if it is nonrandom - it may be possible to
infer k from the data alone.  If not, it may be possible to infer k
from  some a priori argument; this is most likely to happen when the
measured X and Y have the same units and the same source of error - say,
both are weights, of similar magnitude, and measured on the same scales.

        If the units of X and Y are the same, there is a canonical choice of
axis scalings, namely one unit on the x axis = one unit on the y axis.
If the magnitudes of X and Y are very different, this may be silly (even
though canonical). If the natures of the error cources in X and Y are
very different, this may be silly (even though canonical).  If the units
are different, though, there is *no* canonical choice of scalings:
should 1 ohm correspond to 1 meter or 1 centimeter?
 
        -Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to