Hello ecolog,

Thanks to all of you who responded to my question about data transformations
suitable for my power analysis of percent cover data (original post is
repeated at the bottom of this summary).  

A number of people suggested the “standard” transformation for percent cover
data, the arcsine square root transform.  While this transformation would
have bounded the data between zero and one, it has the undesirable side
effect of being non-monotonic, which would have been an issue with my
simulated data.

Several people pointed me towards a recent paper, Warton and Hui 2011
(Ecology 92:3-10). These authors propose a modification to the logit
transformation, specifically adding a small value to both the numerator and
denominator of the logit function.  This is the approach that I am now
pursuing with my analysis.

There is clearly a lot of debate back and forth about the merits of
transforming data, and the difficulty of interpreting the output when
transformations are used, and I appreciate the recommendations I have
received about using data transformation sparingly.  I tend to agree with
these comments, but in this case I feel that having a simulation with
realistic data and meaningful predictions outweighs the difficulties of
back-transforming and interpreting the output.

Thanks again for the helpful feedback to my query!

Original post:
I am working on a power analysis simulation for long-term forest monitoring
data, with the goal of documenting our power to detect trends over time. The
simulation is based on a repeated measures hierarchical model, where future
data is simulated based on the initial data set and a bootstrap of pilot
data differences between observation periods, multiplied by a range of
effect sizes (50% decline to 50% increase).

My question is about the appropriate transformation to use for percent cover
data in this simulation. I don’t want to use raw percentages because the
simulation will easily result in proportions less than zero or greater than
one.  Similarly, a log transform can easily result in back-transformed
proportions greater than one.  Most other transforms I’ve looked at would
not prevent back-transformed data from exceeding one or the other
boundaries.  The exception is the logistic transform, which would indeed
force all simulated data to be between zero and one when back-transformed. 
However, the logistic transform gives values of negative infinity for a
percent cover of zero, and positive infinity for a percent cover of one.  I
was thinking that adding a tiny number to zeros and subtracting a tiny
number from ones (e.g., 0.00001) would solve the problem (roughly equivalent
to a log of x+1 transform), but I have been unable to find reference to
anyone using this approach for percent cover data.  Does anyone have any
thoughts about the validity of my proposed approach or of another approach
that would help solve my problem?

Brian Mitchell
NPS Northeast Temperate Network Program Manager
Adjunct Assistant Professor, University of Vermont
[email protected]

Reply via email to