Hi John. I don't have a definitive answer for you, but here's some additional
thoughts:
1. If your genomes are closely enough related, it shouldn't matter overly
much. For closely related genomes, it will be deletions and rearrangements
that cause more issues than mismatches.
2. I would go with the distance matrix over a tree constructed from orthologs
present in all genomes. However, as your follow-up email seemed to be saying,
the number of orthologous proteins between a draft and reference is probably a
better measure than either (again, if they are closely related). Without the
sequence being present, there is no information to use to order. The distance
matrix does take presence/absenced into account when scoring, but is largely
based on substitution distances.
3. While the number of contigs with conflicting order information might be
helpful, I'd go with number of contigs ordered or length of aligned region
first. If this is similar, number with conflicting order information might be
a good second measure. However, if the genomes are more distantly related,
there may be fewer with conflicting order info b/c there were fewer aligned,
and this might decrease power to detect true rearrangements.
4. To summarize, the closer the genomes are related, the more I'd go with
presence/absence/structural measures. If they are more distantly related (ie
aligned region shows low sequence similarity and you suspect more is
orthologous than is aligning), I'd go with the distance matrix, which should
indicate the reference most will be aligned to.
5. I think using multiple references and comparing is a good strategy overall.
Hope this helps
Anna Rissman
Hi Anna,
Thank you for your reply. Looking forward to hearing from you.
In the meantime I've started a run of reordering the draft genomes and will
check the results of the contig mover. Hopefully my estimation of which
reference to choose will be valid. I've checked the number of proteins in each
draft that are orthologous to all finished genomes (from OrthoMCL output) and
also which reference/draft genome combination that has the smallest distance in
the Mauve distance matrix (from running an alignment of all genomes. In 50% of
the cases the draft/reference pair which shared the most orthologous proteins
also had the least distance in Mauve. In the rest, I will check which pair
gives the least amount of contigs with conflicting order information.
Don't know if this is a good validation for choices, but I wanted to get
started while I wait for your response.
Sincerely,
John
//John Larsson
//Stockholm University, Department of Botany, S-106 91 Stockholm, Sweden
//E-mail: john.lars...@botan.su.se
//Phone: +468163407
Hi John. I got this message, and will respond, but am submitting a paper in
the next couple days, and it will most likely be after that.
Anna Rissman
----- Original Message -----
From: John Larsson <john.lars...@botan.su.se>
Date: Monday, December 13, 2010 11:43 am
Subject: [Mauve-users] Choosing reference genomes for contig mover
To: mauve-users@lists.sourceforge.net
> Hi,
>
> I'm planning to run an alignment of a mix of bacterial genomes, some
> finished and some in draft status. I've tried the Mauve Contig Mover
> and it works well from command-line (running on linux). The question I
> have is related to choosing the "best" reference genome for each draft
> genome when ordering contigs. I have a phylogenetic tree based on a
> rather large set of single-copy orthologous genes present in all the
> genomes (obtained from the OrthoMCL program). My first idea was to
> choose the nearest neighbour in that tree with a finished genome. Then
> I started thinking maybe that doesn't tell me which reference genome
> is the best choice.
>
> What I'm wondering is if the distance matrix that Mauve prints to the
> log is a good way of finding the best reference, i.e. for each draft
> genome choose reference genome with the smallest distance to the draft
> in the matrix.
>
> Any help or guidance is appreciated.
>
> Thanks,
> John
>
> ------------------------------------------------------------------------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
>
> _______________________________________________
> Mauve-users mailing list
> Mauve-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mauve-users
>
------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users