Dear All,

I would like to share a script which extracts core alignment in fasta format from Mauve output.

Let us use the example in the following link
https://code.google.com/p/clonalorigin/wiki/FromGenomeAssemblyToRecombination

Our goal is to extract the core alignment for the input 4 genomes.

Step 1: get the alignment using progressiveMauve, run
progressiveMauve --output=full_alignment.xmfa genome1.gbk genome2.gbk genome3.gbk genome4.gbk

Step 2: extract LCBs shared by all genomes
stripSubsetLCBs full_alignment.xmfa full_alignment.xmfa.bbcols core_alignment.xmfa 500 4 the first number "500" is the minimum length of the LCB; the second number "4" indicates the minimum number of genomes that share a LCB

Step 3: concatenate all the LCBs
perl xmfa2fasta.pl core_alignment.xmfa
You will get "core_alignment.fasta" in the current directory.

Cheers,
Lu Cheng
#!/usr/bin/perl -w
# This script extracts core alignment in fasta format from xmaf file
# Lu Cheng
# 26.07.2013

# Input: .xmaf file, note that each block in the .xmaf file must contain the 
same number of sequences (see usage of stripSubsetLCBs)
#Output: .fasta file of the  core alignment

use strict;
use warnings;

my $inFile=$ARGV[0];

my $infh = ();
open($infh, $inFile) or die "Could not open file '$inFile' $!";
my $line = <$infh>;  # skip the first line
$line= <$infh>;
my $nSeq =0;
my @seqNames = ();
my $i = 0;
my @seqs = ();
my $tmp = ();

my $flag1=1;
while($line){
        if($line =~ /^#/){
                $nSeq=$nSeq+1;

                # handle the headers 
                $line =~ s/^#.*\t//;
                $line =~ s/^.*\///;
                $line =~ s/\..*$//;
                chomp($line);
                push(@seqNames,$line);
                
                $line=<$infh>; # skip the second line
                $line=<$infh>; # read the next line

        }elsif($line =~ /^>/){
        
                # read one sequence
                $tmp=();
                while(<$infh>){
                        chomp($_);
                        
                        if(/^[=>]/){
                                $line = $_;
                                last;
                        }else{
                                chomp($_);
                                $tmp = $tmp . $_;
                        }
                }
                
                if(scalar(@seqs)<$nSeq){
                        $seqs[$i] = $tmp;
                }else{
                        $seqs[$i] = $seqs[$i] . $tmp;
                }
                
                $i = $i+1;
        }elsif($line =~ /^=/){
                $i=0;
                $line=<$infh>;
        }else{
                die "can not parse this file format!";
        }
}
close($infh);

my $outFile=$inFile;
$outFile=~s/\.xmfa//;
$outFile = $outFile . ".fasta";
my $outfh =();
open($outfh,">$outFile") or die "can not creat file $outFile $!";
for($i=0; $i<$nSeq;$i++){
        print $outfh ">$seqNames[$i]\n";
        while (my $chunk = substr($seqs[$i], 0,80, "")) {
               print $outfh "$chunk\n";
       }
       print $outfh "\n";
}
close($outfh);

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users

Reply via email to