Rich Ulrich wrote:
> > Plus, I've run the multiple
> > regression without the transform and seen only about a 5% difference
> > (not much).
>
> - damned if I know what that sentence means. You mean, like,
> accounting for 99% of the variance, instead of 94%? -- that means "5%"
> by two different criteria.
>
Yes, exactly. If I use a square root transform, the variable accounts for 99%
of the variance and if I don't (i.e. just use the raw spike rates) it accounts
for 94% of the variance.
>
> Are you concerned with tests, or with models where you can compare
> "variance"? Personally, if I see counts, I often assume that the
> linearity is going to be measured with the square root of the counts.
> Also, the variability, or the interesting "variance."
>
>
The basic problem is this:
(1) the data is from a non-normal distribution ("events" per bin, or
spikes/bin as we call it)
(2) we are using a multiple linear regression (which has been used extensively
in the past so that's not an issue) to model the data.
(3) we want to see if one or two independent variables in the regression
consistently account for the most of the variance in the dependent variable
(i.e. do we need the "full" model to predict the spikes/bin, or can we find a
smaller model that will do as good of a prediction most of the time)
(4) in a recent paper, someone argued that he had done such an analysis with
computer-generated data and found radically different results with and without
the square root transform. With the transform a variable accounted for 43% of
the variance and without it only accounted for 21%. He argues that the square
root transform has artificially and incorrectly increased the relative
importance of the variable.
I don't believe that (4) was analyzed correctly. I've done my analysis both
with and without the transform (using real data, not computer-generated) and
found essentially no difference (certainly not 43% down to 21%). So I can
certainly argue that experimentally his concerns over bias don't hold true;
I'd like to say a little more from a theoretical standpoint. My take is that
this critic, along with many others in my field, simply assume that
transformed data must be some kind of smoke and mirrors, a "trick" that
somehow makes the analysis flawed. In Sokal and Rohlf's Biometry (3rd ed.,
1995), it gives a nice description of this on p. 411.
-Tony
--
///////////////////////////////////////////////////
// G. Anthony Reina, MD //
// The Neurosciences Institute //
// 10640 John Jay Hopkins Drive //
// San Diego, CA 92121 //
// Phone: (858) 626-2132 //
// FAX: (858) 626-2199 //
////////////////////////////////////////////
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================