Dear All,
I would like to share a script which extracts core alignment in fasta
format from Mauve output.
Let us use the example in the following link
https://code.google.com/p/clonalorigin/wiki/FromGenomeAssemblyToRecombination
Our goal is to extract the core alignment for the input 4 genomes.
Step 1: get the alignment using progressiveMauve, run
progressiveMauve --output=full_alignment.xmfa genome1.gbk genome2.gbk
genome3.gbk genome4.gbk
Step 2: extract LCBs shared by all genomes
stripSubsetLCBs full_alignment.xmfa full_alignment.xmfa.bbcols
core_alignment.xmfa 500 4
the first number "500" is the minimum length of the LCB; the second
number "4" indicates the minimum number of genomes that share a LCB
Step 3: concatenate all the LCBs
perl xmfa2fasta.pl core_alignment.xmfa
You will get "core_alignment.fasta" in the current directory.
Cheers,
Lu Cheng
#!/usr/bin/perl -w
# This script extracts core alignment in fasta format from xmaf file
# Lu Cheng
# 26.07.2013
# Input: .xmaf file, note that each block in the .xmaf file must contain the
same number of sequences (see usage of stripSubsetLCBs)
#Output: .fasta file of the core alignment
use strict;
use warnings;
my $inFile=$ARGV[0];
my $infh = ();
open($infh, $inFile) or die "Could not open file '$inFile' $!";
my $line = <$infh>; # skip the first line
$line= <$infh>;
my $nSeq =0;
my @seqNames = ();
my $i = 0;
my @seqs = ();
my $tmp = ();
my $flag1=1;
while($line){
if($line =~ /^#/){
$nSeq=$nSeq+1;
# handle the headers
$line =~ s/^#.*\t//;
$line =~ s/^.*\///;
$line =~ s/\..*$//;
chomp($line);
push(@seqNames,$line);
$line=<$infh>; # skip the second line
$line=<$infh>; # read the next line
}elsif($line =~ /^>/){
# read one sequence
$tmp=();
while(<$infh>){
chomp($_);
if(/^[=>]/){
$line = $_;
last;
}else{
chomp($_);
$tmp = $tmp . $_;
}
}
if(scalar(@seqs)<$nSeq){
$seqs[$i] = $tmp;
}else{
$seqs[$i] = $seqs[$i] . $tmp;
}
$i = $i+1;
}elsif($line =~ /^=/){
$i=0;
$line=<$infh>;
}else{
die "can not parse this file format!";
}
}
close($infh);
my $outFile=$inFile;
$outFile=~s/\.xmfa//;
$outFile = $outFile . ".fasta";
my $outfh =();
open($outfh,">$outFile") or die "can not creat file $outFile $!";
for($i=0; $i<$nSeq;$i++){
print $outfh ">$seqNames[$i]\n";
while (my $chunk = substr($seqs[$i], 0,80, "")) {
print $outfh "$chunk\n";
}
print $outfh "\n";
}
close($outfh);
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users