Re: [ECOLOG-L] Transformations for Normalized Data

Jane Shevtsov Mon, 08 Nov 2010 12:37:41 -0800

Thanks, Gavin. I've already been told that normalizing the data is
unnecessary and proceeded to not do so.


The further analysis is a rather unusual one; the only ecologist I
know of to have used it is Bill Shipley. My goal is to see how
different species affect each other's abundances. Multiple regression
isn't an option as I have more species than plots -- and besides,
regression isn't really causal, especially when you can't single out
independent variables. Instead, I'm going to use the causal discovery
algorithms of Judea Pearl and Peter Spirtes. They don't require
anything beyond correlation in terms of statistics but can find causal
relationships from observational data if you assume that the
underlying causal structure is acyclic. I found an R package, pcalg,
that implements these algorithms, so hopefully I won't have to program
them myself.

I strongly encourage people to check out these methods. Shipley's
website is a good place to start.

Best,
Jane

On Mon, Nov 8, 2010 at 1:22 AM, Gavin Simpson <gavin.simp...@ucl.ac.uk> wrote:
> On Sat, 2010-10-30 at 13:04 -0700, Jane Shevtsov wrote:
>> It's an intermediate step. I need to control for elevation before
>> going on to further analysis. (My data comes from plots at varying
>> elevations in the Smoky Mountains.) The ultimate goal is to find
>> species' influences on each other's abundances, for which I'll
>> probably use Pearl's Inferred Causation algorithm. (I was originally
>> planning to just use multiple regression but have too many species and
>> not enough data points for that.)
>
> Jane,
>
> Sorry to come to this late.
>
>> Hope that makes it a little clearer.
>
> Not really. Surely this depends on what subsequent analysis you want to
> do. If it involves regression or ordination, you could just as well
> include altitude in your models and work from their, assessing
> improvements in fit over a null model that includes altitude.
>
> Abundance data are unlikely to be Gaussian - why force them to be so?
> The canonical transformation for such data is the log. A recent paper by
> Bob O'Hara and Johan Kotze [1] shows us that doing this is not a good
> idea. Instead use a statistical model that seems plausible; Poisson GLM
> or extensions to this if overdispersion and/or zero-augmentation is an
> issue, such as negative binomial models, zero-inflated or zero-altered
> models etc..
>
> If you are fitting individual regressions to each species, why normalize
> them at all? Why don't you want "residuals" in the same units as the
> original data? If you are interested in the community/assemblage level
> then wouldn't an ordination-based approach be more useful?
> CCA/RDA/db-RDA are all just regression models after-all, and Thomas
> Yee's Canonical Gaussian Ordination (see his papers in Ecology and
> Ecological Monographs) is a formal representation of this. But allow you
> to work at the community level, include altitude as a "nuisance"
> variable etc.
>
> Perhaps if you explain what your further analyses are, you'd get more
> relevant replies
>
> HTH
>
> G
>
> [1]
> http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2010.00021.x/abstract
>
>> Best,
>> Jane
>>
>> On Sat, Oct 30, 2010 at 5:32 AM, Bill Silvert <cien...@silvert.org> wrote:
>> > I'm not clear on whether this is a thread about ecology or statistics. Jane
>> > writes "My goal is simply to do a regression" which seems a strange kind of
>> > goal. If she wants to predict abundances or identify causative factors, 
>> > that
>> > I understand, but what kind of goal is doing a regression?
>> >
>> > How do we even know that the regression she is looking for exists? Even
>> > obvious regressions can be misleading. I was once approached by a colleague
>> > who asked for my help finding the relationship between parental biomass and
>> > surviving offspring, but a quick look at the data showed that no
>> > relationship existed. So instead we set about looking for factors that
>> > determined the number of offspring and found a good correlation with
>> > environmental factors (Koslow, J. Anthony, Keith R. Thompson, and William
>> > Silvert. 1987. Recruitment to Northwest Atlantic Cod (Gadus morhua) and
>> > Haddock (Melanogrammus aeglefinus) Stocks: Influence of Stock Size and
>> > Environment. Can. J. Fish. Aquat. Sci. 44:26-39). We not only identified a
>> > predictive pattern, but we could conclude that even though the fish were
>> > extremely fecund, the number of survivors depended on an environmental
>> > bottleneck so that the number of eggs was not very important.
>> >
>> > William Silvert
>> >
>> > -----Original Message----- From: Jane Shevtsov
>> > Sent: Friday, October 29, 2010 12:31 AM
>> > To: ECOLOG-L@LISTSERV.UMD.EDU
>> > Subject: Re: [ECOLOG-L] Transformations for Normalized Data
>> >
>> > Hi Mike,
>> >
>> > Dividing by the mean helps. Still, there are definitely too many zeros
>> > in my data, so what should I do with the distributions you mentioned?
>> > My goal is simply to do a regression.
>> >
>> > Thanks,
>> > Jane
>> >
>> > On Thu, Oct 28, 2010 at 8:11 PM, <mdie...@life.illinois.edu> wrote:
>> >>
>> >> 1) divide by the mean instead of the maximum
>> >>
>> >> 2) abundance data is rarely normal even before normalization because A)
>> >> abundance can never be negative and B) it usually has too many zeros
>> >> because one is convolving two processes (probability of presence times
>> >> abundance given presence). If your data shows (B) I recommend using a
>> >> zero-inflated distribution while if it shows (A) I would recommend a
>> >> distribution that is positive (e.g. lognormal or gamma). Because I
>> >> usually work with count data I prefer the zero-inflated Poisson or
>> >> zero-inflated Negative Binomial, but once you've normalized that's no
>> >> longer an option. I'd probably try a zero-inflated lognormal or
>> >> zero-inflated gamma, with the former being conceptually simpler because it
>> >> doesn't require a link function. If neither (B) nor (A) is present in
>> >> your data you're very luck and can stick to the normal (possibly with
>> >> transformation).
>> >>
>> >> -- Mike
>> >>
>> >>> I have abundance data for a number of different species that I need to
>> >>> use in a regression. Since the data encompasses a variety of taxa
>> >>> (from trees to soil mites) whose abundances are measured differently,
>> >>> I normalized it, dividing abundances of each species by the maximum
>> >>> abundance of that species. This, of course, produces numbers ranging
>> >>> from 0 to 1, with a 1 for every species.
>> >>>
>> >>> Now I'm trying to transform the data into something approaching
>> >>> normality. I've tried various combinations of arcsin, square root,
>> >>> fourth root, and log (after adding 1, as there are plenty of zeros in
>> >>> the data), but nothing seems to help much. The problem appears to be
>> >>> the presence of a 1 in every column. Any ideas for what might work?
>> >>>
>> >>> Thanks,
>> >>> Jane Shevtsov
>> >>>
>> >>> --
>> >>> -------------
>> >>> Jane Shevtsov
>> >>> Ecology Ph.D. candidate, University of Georgia
>> >>> co-founder, <www.worldbeyondborders.org>
>> >>> Check out my blog, <http://perceivingwholes.blogspot.com>Perceiving
>> >>> Wholes
>> >>>
>> >>> "The whole person must have both the humility to nurture the
>> >>> Earth and the pride to go to Mars." --Wyn Wachhorst, The Dream
>> >>> of Spaceflight
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > -------------
>> > Jane Shevtsov
>> > Ecology Ph.D. candidate, University of Georgia
>> > co-founder, <www.worldbeyondborders.org>
>> > Check out my blog, <http://perceivingwholes.blogspot.com>Perceiving Wholes
>> >
>> > "The whole person must have both the humility to nurture the
>> > Earth and the pride to go to Mars." --Wyn Wachhorst, The Dream
>> > of Spaceflight
>>
>>
>>
>
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
>



-- 
-------------
Jane Shevtsov
Ecology Ph.D. candidate, University of Georgia
co-founder, <www.worldbeyondborders.org>
Check out my blog, <http://perceivingwholes.blogspot.com>Perceiving Wholes

"The whole person must have both the humility to nurture the
Earth and the pride to go to Mars." --Wyn Wachhorst, The Dream
of Spaceflight

Re: [ECOLOG-L] Transformations for Normalized Data

Reply via email to