Hi Guy, On Wed, 2010-08-18 at 16:57 -0500, Guy Plunkett III wrote: > I've got some contigs from an assembly using ABySS that I want to align with > a related genome. If I try running mauve I get the following: > > OS name is: Mac OS X arch: x86_64 > Executing: > /Applications/Mauve.app/Contents/MacOS/progressiveMauve --output=ABySS test > --output-guide-tree=ABySS test.guide_tree --backbone-output=ABySS > test.backbone /Users/guy/CP001918-20.gbk /Users/guy/Ecl13047-contigs.fa > Storing raw sequence at > /var/folders/22/22Fli5j6G6OGbn0txlgpGk+++TI/-Tmp-/rawseq3797.000 > Sequence loaded successfully. > /Users/guy/CP001918-20.gbk 5598796 base pairs. > Storing raw sequence at > /var/folders/22/22Fli5j6G6OGbn0txlgpGk+++TI/-Tmp-/rawseq3797.001 > Sequence loaded successfully. > /Users/guy/Ecl13047-contigs.fa 6221099 base pairs. > Using weight 15 mers for initial seedsERROR! gap character encountered at > genome sequence position 2903159 > > Creating sorted mer list > Create time was: 2 seconds. > Creating sorted mer list > Input sequences must be unaligned and ungapped! > Caught signal 11 > Cleaning up and exiting! > Temporary files deleted. > Exited with error code: 11 > > > The file "Ecl13047-contigs.fa" seems to be the culprit, but I can find no > internal gap characters in any of the 4887 contigs. However, the sequences in > the fasta are unwrapped (ABySS doesn't support wrapped fasta sequences in > either input or output), and the longest such entry is 62225 bp. If I wrap > the sequences at 80 characters/line -- and then clean up some inadvertantly > wrapped definition lines that are also absurdly long -- mauve has no > problems. Is there a maximum line length being assumed? >
I've never used ABySS before, but I do know that some assemblers have a habit of placing unusual characters in the assembled contigs. mauveAligner and progressiveMauve can handle FastA data which is not line-wrapped, and there is no 1980's style upper limit on line length. That said, I suspect the culprit in your case is either inconsistent use of End-of-Line characters in the assembly file, the presence of some non-printing ascii or other non IUPAC nucleotide/ambiguity character, or unicode encoding. The sequence parser in the aligner is definitely sensitive to all three of those issues. The EOL issue can usually be solved by running a program like dos2unix or unix2dos on the sequence file. All but the most basic text editors will be able to change encoding from unicode to ascii. The non-printing and non-IUPAC sequence character issue is a bit more tricky and I don't have a good general solution for fixing those issues, apart from requesting that the author of software generating such files provide an option to generate the files in a more standards-conforming way. Hope that helps, -Aaron ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Mauve-users mailing list Mauve-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mauve-users