----- Forwarded message from morphmet_modera...@morphometrics.org -----

Date: Thu, 12 Dec 2013 17:02:59 -0800
From: morphmet_modera...@morphometrics.org
Reply-To: morphmet_modera...@morphometrics.org
Subject: Re: missing structures
To: morphmet@morphometrics.org


----- Forwarded message from Philipp Mitteröcker <mitte...@univie.ac.at> -----

Date: Wed, 4 Dec 2013 15:53:46 -0500
From: Philipp Mitteröcker <mitte...@univie.ac.at>
Reply-To: Philipp Mitteröcker <mitte...@univie.ac.at>
Subject: Re: missing structures
To: morphmet@morphometrics.org

Dear Douglas,

Thanks for your posting. You are right, it is of course technically possible to apply the EM algorithm as well as other methods such as TPS interpolation to estimate landmarks that do not exist in the specimens. But my point was more a conceptual and biological one than a technical one. How to interpret statistical results in a biologically meaningful way based on these estimated landmarks that never existed? In a study on lizards, would you estimate the limb length of slow worms?

In developmental or evolutionary studies, the absence of variables (landmarks) usually is a signal, not missing data. Yet this would be a qualitative (binary) trait, not a quantitative one. Estimating the values would mask this signal and substitute it with some, more or less arbitrary, quantitative value, which in turn would affect many subsequent statistics, including Procrustes distances, PCA, covariance patterns, etc. It may depend on the actual biological question and the statistical analysis whether or not such missing data estimates would lead to problematic conclusions.

Good that you pointed out how the technical meaning of missing at random (MAR) deviates from the everyday usage. In my previous posting I referred to the technical meaning, though. But I think the situation is not as clear as both of us pretended. 
Indeed, the definition of MAR is that the probability of being missing only depends on observed data, AND NOT ON THE MISSING VARIABLE. But if applied strictly, this definition cannot be applied to the current situation, because the missing variable does not exist in some of the specimens, so it cannot even be said that the probability is unrelated to it.

The situation in biology apparently differs from the scientific domains (psychology, sociology) in which missing data estimation is most often used. Variables such as questions of a questionnaire can hardly be non-existent, usually they are just unobserved. But again, this is different in biology, where the absence of data points can be a signal.

It is a big challenge in contemporary biology to analyze data like these. If you have any ideas, they are greatly welcome! 

Best,

Philipp







Am 03.12.2013 um 22:34 schrieb morphmet_modera...@morphometrics.org:


----- Forwarded message from Douglas Theobald <dtheob...@brandeis.edu> -----

Date: Sun, 1 Dec 2013 10:06:20 -0500
From: Douglas Theobald <dtheob...@brandeis.edu>
Reply-To: Douglas Theobald <dtheob...@brandeis.edu>
Subject: Re: missing structures
To: morphmet@morphometrics.org

The "missing data" problem is much misunderstood, and the technical
senses of "missing data" and "missing at random" do not correspond to
everyday, intuitive usage.

In fact, Patrick's problem can be validly discussed as a "missing
data" problem, in the statistical sense of the term.  For example, the
Expectation-Maximization algorithm is often characterized as dealing
with "missing data", and the EM algorithm can deal elegantly with data
that exist but were not measured, which is the more intuitive,
non-technical sense of "missing data".  However, the EM algorithm can
also deal with data that are unobserved because they do not exist
(this is the "qualitative" difference Philipp mentions).  In this case
you can view the EM algorithm as a mathematical trick that gets the
right answer by pretending that the data are missing in the common
sense way.  

For the EM algorithm (and many other statistical data imputation
methods) to be valid, the data must be "missing at random" (MAR), as
Philipp says.  But the technical definition of MAR does not correspond
to the intuitive sense of "random".  Missing morphological data often
is MAR -- for instance, Patrick's "missing data", in which certain
landmarks are unobserved because they never develop in the first
place, are MAR and hence can be validly treated by the EM algorithm. 
MAR only requires that the probability of a data point being absent
depends only on observed data.  Clearly, in Patrick's case we can
determine whether a developmental landmark will be absent based only
on observed data (e.g., if we know the organism the data is from).  




> On Sun, Dec 1, 2013 at 3:33 AM, <morphmet_modera...@morphometrics.org> wrote:
> ----- Forwarded message from Philipp Mitteröcker <mitte...@univie.ac.at> -----
>      Date: Thu, 28 Nov 2013 11:26:34 -0500
>       From: Philipp Mitteröcker <mitte...@univie.ac.at>
>       Reply-To: Philipp Mitteröcker <mitte...@univie.ac.at>
>       Subject: Re: missing structures
> The problem raised by Patrick is not really a missing data problem.
> Missing data, in the technical sense, are structures or properties
> that do exist in the specimens but could not have been measured.
> Hence it can make some sense to estimate them. But when structures
> simply do not exist in some specimens, what does it mean to estimate
> them?
> In other words, if a structure is present in one group and absent in
> another group, these groups differ not only quantitatively but also
> qualitatively. Estimating the values, or letting landmarks overlap,
> means that a qualitative difference is -- arbitrarily -- substituted
> by a quantitative one. Many statistical results will be affected by
> this arbitrariness.
> Note also that missing data approaches usually require the data to
> be missing at random, which is presumably not the case in the
> problem at hand.
> Best,
> Philipp
> Am 28.11.2013 um 10:55 schrieb morphmet_modera...@morphometrics.org:
> >
> > ----- Forwarded message from sebastien couette <sebastien.coue...@u-bourgogne.fr> -----
> >
> >     Date: Mon, 25 Nov 2013 05:06:20 -0500
> >      From: sebastien couette <sebastien.coue...@u-bourgogne.fr>
> >      Reply-To: sebastien couette <sebastien.coue...@u-bourgogne.fr>
> >      Subject: Re: missing structures
> >
> > Dear Patrick,
> > >
> > I published a paper on missing data in 2010:
> > >
> > Sébastien Couette, Jess White (2010)3D geometric morphometrics and
> > missing-data. Can extant taxa give clues for the analysis of fossil
> > primates? Comptes Rendus Palevol 9(6):423-433.
> > DOI:10.1016/j.crpv.2010.07.002
> > >
> > I can send you a copy.
> > >
> > There is also a good paper on this topic in Systbiol:
> > >
> > Brown, C.M., Arbour, J.H., Jackson,D.A. (2012). testing the effect
> > of missing data estimation and distribution in morphometric
> > multivariate data analyses. Systematic biology,61(6),941-954.
> > >
> > Feel free to contac me if any questions
> > >
> > Sébastien
> >
> > --
> > -----------------------------------------
> > Dr. Sébastien Couette
> >
> > EPHE&UMR CNRS 6282 Biogéosciences
> > Université de Bourgogne
> > 6 Bld Gabriel
> > 21000 Dijon
> >
> > Tél.: 33. (0)3.80.39.64.48
> > Fax : 33. (0)3.80.39.63.87
> >
> >
> > Responsable de la spécialité "Biodiversité et Gestion de l'Environnement" du Master "Biologie Santé Ecologie" de l'EPHE
> >
> > Master EPHE spécialité "Biodiversité et Gestion de l'Environnement"
> >
> > ----- End forwarded message -----
> >
> >
> ___________________________________
> Dr. Philipp Mitteroecker
> Department of Theoretical Biology
> University of Vienna
> Althanstrasse 14
> A-1090 Vienna, Austria
> Tel: +43 1 4277 56705
> Fax: +43 1 4277 9544
> ----- End forwarded message -----



----- End forwarded message -----




___________________________________

Dr. Philipp Mitteroecker

Department of Theoretical Biology
University of Vienna
Althanstrasse 14
A-1090 Vienna, Austria

Tel: +43 1 4277 56705
Fax: +43 1 4277 9544
email: philipp.mitteroec...@univie.ac.at
homepage: http://theoretical.univie.ac.at/people/mitteroecker



----- End forwarded message -----





----- End forwarded message -----



Reply via email to