Hi John.  I don't have a definitive answer for you, but here's some additional 
thoughts:

1.  If your genomes are closely enough related, it shouldn't matter overly 
much.  For closely related genomes, it will be deletions and rearrangements 
that cause more issues than mismatches.

2.  I would go with the distance matrix over a tree constructed from orthologs 
present in all genomes.  However, as your follow-up email seemed to be saying, 
the number of orthologous proteins between a draft and reference is probably a 
better measure than either (again, if they are closely related).  Without the 
sequence being present, there is no information to use to order.  The distance 
matrix does take presence/absenced into account when scoring, but is largely 
based on substitution distances.

3.  While the number of contigs with conflicting order information might be 
helpful, I'd go with number of contigs ordered or length of aligned region 
first.  If this is similar, number with conflicting order information might be 
a good second measure.  However, if the genomes are more distantly related, 
there may be fewer with conflicting order info b/c there were fewer aligned, 
and this might decrease power to detect true rearrangements.

4.  To summarize, the closer the genomes are related, the more I'd go with 
presence/absence/structural measures.  If they are more distantly related (ie 
aligned region shows low sequence similarity and you suspect more is 
orthologous than is aligning), I'd go with the distance matrix, which should 
indicate the reference most will be aligned to.

5.  I think using multiple references and comparing is a good strategy overall.




Hope this helps
Anna Rissman




Hi Anna,
Thank you for your reply. Looking forward to hearing from you.

In the meantime I've started a run of reordering the draft genomes and will 
check the results of the contig mover. Hopefully my estimation of which 
reference to choose will be valid. I've checked the number of proteins in each 
draft that are orthologous to all finished genomes (from OrthoMCL output) and 
also which reference/draft genome combination that has the smallest distance in 
the Mauve distance matrix (from running an alignment of all genomes. In 50% of 
the cases the draft/reference pair which shared the most orthologous proteins 
also had the least distance in Mauve. In the rest, I will check which pair 
gives the least amount of contigs with conflicting order information.

Don't know if this is a good validation for choices, but I wanted to get 
started while I wait for your response.

Sincerely,
John


//John Larsson
//Stockholm University, Department of Botany, S-106 91 Stockholm, Sweden
//E-mail: john.lars...@botan.su.se
//Phone: +468163407


Hi John.  I got this message, and will respond, but am submitting a paper in 
the next couple days, and it will most likely be after that.



Anna Rissman

----- Original Message -----
From: John Larsson <john.lars...@botan.su.se>
Date: Monday, December 13, 2010 11:43 am
Subject: [Mauve-users] Choosing reference genomes for contig mover
To: mauve-users@lists.sourceforge.net


> Hi,
>  
>  I'm planning to run an alignment of a mix of bacterial genomes, some 
> finished and some in draft status. I've tried the Mauve Contig Mover 
> and it works well from command-line (running on linux). The question I 
> have is related to choosing the "best" reference genome for each draft 
> genome when ordering contigs. I have a phylogenetic tree based on a 
> rather large set of single-copy orthologous genes present in all the 
> genomes (obtained from the OrthoMCL program). My first idea was to 
> choose the nearest neighbour in that tree with a finished genome. Then 
> I started thinking maybe that doesn't tell me which reference genome 
> is the best choice.
>  
>  What I'm wondering is if the distance matrix that Mauve prints to the 
> log is a good way of finding the best reference, i.e. for each draft 
> genome choose reference genome with the smallest distance to the draft 
> in the matrix.
>  
>  Any help or guidance is appreciated.
>  
>  Thanks,
>  John
>  
> ------------------------------------------------------------------------------
>  Lotusphere 2011
>  Register now for Lotusphere 2011 and learn how
>  to connect the dots, take your collaborative environment
>  to the next level, and enter the era of Social Business.
>  http://p.sf.net/sfu/lotusphere-d2d
>  
> _______________________________________________
>  Mauve-users mailing list
>  Mauve-users@lists.sourceforge.net
>  https://lists.sourceforge.net/lists/listinfo/mauve-users
>  
------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users

Reply via email to