Hi Guy,

On Wed, 2010-08-18 at 16:57 -0500, Guy Plunkett III wrote: 
> I've got some contigs from an assembly using ABySS that I want to align with 
> a related genome. If I try running mauve I get the following:
> 
> OS name is: Mac OS X arch: x86_64
> Executing: 
> /Applications/Mauve.app/Contents/MacOS/progressiveMauve --output=ABySS test 
> --output-guide-tree=ABySS test.guide_tree --backbone-output=ABySS 
> test.backbone /Users/guy/CP001918-20.gbk /Users/guy/Ecl13047-contigs.fa 
> Storing raw sequence at 
> /var/folders/22/22Fli5j6G6OGbn0txlgpGk+++TI/-Tmp-/rawseq3797.000
> Sequence loaded successfully.
> /Users/guy/CP001918-20.gbk 5598796 base pairs.
> Storing raw sequence at 
> /var/folders/22/22Fli5j6G6OGbn0txlgpGk+++TI/-Tmp-/rawseq3797.001
> Sequence loaded successfully.
> /Users/guy/Ecl13047-contigs.fa 6221099 base pairs.
> Using weight 15 mers for initial seedsERROR! gap character encountered at 
> genome sequence position 2903159
> 
> Creating sorted mer list
> Create time was: 2 seconds.
> Creating sorted mer list
> Input sequences must be unaligned and ungapped!
> Caught signal 11
> Cleaning up and exiting!
> Temporary files deleted.
> Exited with error code: 11
> 
> 
> The file "Ecl13047-contigs.fa" seems to be the culprit, but I can find no 
> internal gap characters in any of the 4887 contigs. However, the sequences in 
> the fasta are unwrapped (ABySS doesn't support wrapped fasta sequences in 
> either input or output), and the longest such entry is 62225 bp. If I wrap 
> the sequences at 80 characters/line -- and then clean up some inadvertantly 
> wrapped definition lines that are also absurdly long -- mauve has no 
> problems. Is there a maximum line length being assumed?
> 

I've never used ABySS before, but I do know that some assemblers have a
habit of placing unusual characters in the assembled contigs.
mauveAligner and progressiveMauve can handle FastA data which is not
line-wrapped, and there is no 1980's style upper limit on line length.

That said, I suspect the culprit in your case is either inconsistent use
of End-of-Line characters in the assembly file, the presence of some
non-printing ascii or other non IUPAC nucleotide/ambiguity character, or
unicode encoding.  The sequence parser in the aligner is definitely
sensitive to all three of those issues.  The EOL issue can usually be
solved by running a program like dos2unix or unix2dos on the sequence
file.  All but the most basic text editors will be able to change
encoding from unicode to ascii.  The non-printing and non-IUPAC sequence
character issue is a bit more tricky and I don't have a good general
solution for fixing those issues, apart from requesting that the author
of software generating such files provide an option to generate the
files in a more standards-conforming way.

Hope that helps,
-Aaron


------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users

Reply via email to