Il 02/11/22 13:32, Han Xiao ha scritto:
Dear Carmelo,
Thank you sooo much for your super details comments and suggestions!
Also again I am very glad that I participated in the Physalia course
with you, not to imagine you still remember me for that!
I do remember most, if not all, of participants to my courses. About
your notes, I will try to answer in a semi-general fashion due to the
general/public nature of the mailing list. You're welcome to contact me
privately for more detailed discussion.
Some thoughts on your comments:
1. Yes my plan is to only focus on the hybrid morph and the parental
morphs. Maybe a bit more on the shape side, yes I am interested in the
individual head shape, which is quite distinct among morphs. However I
found significant size-shape interaction, and the size distribution of
two morphs (one parental and one hybrid) and the other parental morph
doesn't overlap. Will it be a problem? Or I will just take size also as
a covariable?
Hard to tell without literally working with the data. I also seem to
recall that the kind of fish you study differentiate into morphs with
distinct sizes. Let us say that, if I were you, I would ask myself: 1.
"how bad" the interaction is, which morphs it involves and what it is
causing it, 2. whether the allometric variation you are observing within
morphs is mostly static or ontogenetic (and whether this is the case
within all morphs), and 3. to which extent removing between-morph
variation due to size is going to remove interesting biological
variation. I suspect that addressing these questions may help you
figuring out a sensible course of action with your data.
For the input file of shape data, should I just try the
coordinate, and then the distance? I guess the distance is more about
differentiation so not be the case for me.
Again, it may depend. But, as a general answer, you may want to consider
that transforming everything into distances you lose information about
(essentially) "how" shape varies. So, while transforming into distances
may be a good idea in certain cases, in most cases you may want to ask
yourself whether you really want to lose the information in the form of
Procrustes coordinates.
2. How to select the SNPs if I will take the SNPs as input data. As you
said, it is a great idea to choose the fixed markers from the parental
morphs and encode them as numbers. I will try this to see how many
markers I will get. As I work with ddRAD and missingness up to 30% is
common, I was thinking to filter the SNPs on: 1) low missingness
(present in 90% individuals, etc.) and 2) high Fst value ( I guess the
extreme case will be the fixed ones in the parental morphs?). In a
genetic PCA, we normally replace the missing data with the mean, so
missingness may not be a huge problem but whether the
filtering/sub-sampling of SNPs makes sense is important.
Again, another one where it's hard to give a general answer and one
decides on a case-by-case basis. Based on what you say, I would be
inclined to: 1. on the full dataset (i.e., not just the SNPs fixed
between parental morphs) I would select a "robust" subset of SNPs (not
just low missingness but also stringent thresholding and other stuff
which reduces the number of your SNPs but increases quality), to have
good quality data with few missing SNPs 2. I would run some form of
genotype inputation on this global, genome-wide dataset (note that I'm
not referring to mere mean replacement of missing data, but more proper
genotype inputation), 3. I would select those SNPs which are fixed
between parental morphs and biallelic in the hybrid morph (as per
previous message).
To reiterate, this approach may or may not be the right one for you and
it's just an idea based on what you wrote.
3. Take the ancestry proportion calculated by Admixture as the input. I
also get this idea since it is the most straightforward way to target
introgression. I guess either: 1) 2b-PLS, 2) general linear models, and
3) a Mantel test may work.
Not sure whether/why you would need a Mantel test (see also above about
distances).
To summarize, as you said, the input file matters for the exact question
I want to ask. Ancestry proportion will transform the high dimensional
genetic variation into a univariate factor. However this might be the
most straightforward way to do it (in different ways) than taking SNPs
of interest, which will maintain more variations but difficult to
interpret the finding, I guess.
One of the many ways in which the two approaches differ depends on
whether the same or different regions of the genome are involved in
introgression across individuals. For instance, whether two individuals
with the same "level of admixture" ("ancestry proportion") will have
admixture in the same regions, and whether the genomic regions involved
in less admixed inviduals are a subset of the regions involved in more
admixed individuals.
I hope the above helps.
Best,
Carmelo
在2022年11月2日星期三 UTC 05:54:40<Carmelo Fruciano> 写道:
Il 01/11/22 18:02, Han Xiao ha scritto:
> Dear morpho people,
>
> I am writing to ask a rarely discussed question, which is to test
> associations between genomic data and shape variation.
Dear Han,
yes, this is a topic that is perhaps less frequently discussed than
others. As a participant to one of the editions of my geometric
morphometric course, you may recall we covered this general topic to
some extent.
> To describe my system and bit first, I am working with four
sympatric
> fish morphs in a lake. I have both genetic data (SNP generated by
> ddRADSeq, around 12000 SNPs) and shape data (landmarks and
> semilandmarks-based GM for the head shape) of the same fish
samples. The
> genetics indicate that three morphs are genetically distinct,
while one
> is of hybridization origin between two morphs with different
degrees of
> introgression. I like to ask the question: for the three related
morphs
> (two parental morphs and one hybrid morph) there any correlation
between
> the degree of genetic introgression and the shape variation?
Presumably, if this is your main question and by "shape variation" you
mean "individual shape", you would be performing the analysis mainly on
the only introgressed morph (whose individuals I have to assume
based on
your text have varying levels of introgression), using information from
the two "parental" morphs.
> I was suggested to apply a 2b-PLS to test it. Then I searched for
some
> literature and find a few cases. However, the studies vary for
the input
> data of both genetics and morphometrics. For genetics, people
have used
> genetic distances (calculated as Fst/(1-Fst), Fst is a
measurement of
> genetic differentiation), Prevosti distance, allele frequencies
(a few
> microsatellites), and expression results (numeric and
continuous). For
> the shape data, people used Eucidean distances, GPA coordinates,
> centroid sizes, etc.
About the genetic measures, if the question is about the degree of
introgression (of one morph into another when producing the third
morph), it is doubtful that any of the measures you mention would
adequately capture that. For instance, within your introgressed morph
genetic variation among individuals may not be produced exclusively by
varying levels of introgression. So FST would be a poor choice because
it would capture genetic variation produced by other causes (e.g.,
neutral variation). There are other reasons why FST may be a poor
choice, but let's keep it at that.
Perhaps a semi-decent solution to quantify the degree of introgression
would be to subset your SNP panel to only those SNPs (if any) which are
fixed between "parental" morphs (and which are biallelic in the
introgressed morph), code them to reflect their "polarity" (e.g., 0 the
allele in one parental morph, 1 the allele in the other parental morph)
rather than using the actual nucleotides, and then use the data scored
this way for your individuals from the introgressed morph to do tests
with morphology.
The above is just a very rough solution, with ample margins of
improvement depending on the details of your system (e.g., ongoing gene
flow between the two "parental" morphs, with most of alleles not being
fixed between them). But, as you may imagine, this goes well beyond
this
brief reply and would require more in-depth knowledge of your specific
situation (notice how I had to make several assumptions about how
genetic variation is distributed among your morphs).
Notice also that if the level of introgression is all you care about
(regardless of which loci it comes from) you may obtain a much better
and "faster" (i.e., less work for you) solution by using individual
estimates of levels of admixture between morphs from one of the
software
used for analysis of genetic admixture (which you have probably used
anyway).
> So my questions are:
> 1. Do you all agree that sb-PLS should also make sense for such a
> comparison?
PLS may be a good solution to identify how the shape and levels of
introgression co-vary. Tests based on a measure of association (e.g.,
Escoufier RV) may be used to test the null hypothesis that they are
independent.
If your estimate of level of introgression is univariate (notice
that in
the rough solution I suggested above this may not be the case), you may
also consider general linear models (and associated tests of
significance) using the level of introgression as a predictor.
> 2. What you will suggest for the input files? (I do have some
> considerations to discuss)
See above. I suppose the main issue is a bit beyond input files per se
and more about how you quantify/represent introgression.
> 3. Is there any other analysis you will recommend? >
> I know normally people will use GWAS to search for associations,
> however, I am looking for something that can tolerate a smaller
sample
> size (30 fish per morph).
This is absolutely correct. But, more fundamentally, GWAS' goal is
quite
distinct from the hypothesis you want to test.
> Also, the potential transgressive shape of
> hybrids may be a confounding factor, especially there is different
> allometry observed.
Yes. But transgressive segregation may not be a concern if you are just
interested in whether and how levels of introgression scored within a
single "introgressed" morph is associated to shape variation.
Best,
Carmelo
--
==================
Carmelo Fruciano
Italian National Research Council (CNR)
IRBIM Messina
http://www.fruciano.org/ <http://www.fruciano.org/>
==================
--
You received this message because you are subscribed to the Google
Groups "Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/morphmet2/263cb8f5-b0c8-413f-bda2-d35082109581n%40googlegroups.com <https://groups.google.com/d/msgid/morphmet2/263cb8f5-b0c8-413f-bda2-d35082109581n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
==================
Carmelo Fruciano
Italian National Research Council (CNR)
IRBIM Messina
http://www.fruciano.org/
==================
--
You received this message because you are subscribed to the Google Groups
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/morphmet2/543a86bf-ea5b-0be9-3caa-388048e3308e%40unict.it.