On 8/19/10, Aaron Darling wrote: >Hi Guy, > >On Wed, 2010-08-18 at 16:57 -0500, Guy Plunkett III wrote: >> I've got some contigs from an assembly using ABySS that I want to align with >> a related genome. If I try running mauve I get the following: >> >> OS name is: Mac OS X arch: x86_64 >> Executing: >> /Applications/Mauve.app/Contents/MacOS/progressiveMauve --output=ABySS test >> --output-guide-tree=ABySS test.guide_tree --backbone-output=ABySS >> test.backbone /Users/guy/CP001918-20.gbk /Users/guy/Ecl13047-contigs.fa >> Storing raw sequence at >> /var/folders/22/22Fli5j6G6OGbn0txlgpGk+++TI/-Tmp-/rawseq3797.000 >> Sequence loaded successfully. >> /Users/guy/CP001918-20.gbk 5598796 base pairs. >> Storing raw sequence at >> /var/folders/22/22Fli5j6G6OGbn0txlgpGk+++TI/-Tmp-/rawseq3797.001 >> Sequence loaded successfully. >> /Users/guy/Ecl13047-contigs.fa 6221099 base pairs. >> Using weight 15 mers for initial seedsERROR! gap character encountered at >> genome sequence position 2903159 >> >> Creating sorted mer list >> Create time was: 2 seconds. >> Creating sorted mer list >> Input sequences must be unaligned and ungapped! >> Caught signal 11 >> Cleaning up and exiting! >> Temporary files deleted. >> Exited with error code: 11 >> >> >> The file "Ecl13047-contigs.fa" seems to be the culprit, but I can find no >> internal gap characters in any of the 4887 contigs. However, the sequences >> in the fasta are unwrapped (ABySS doesn't support wrapped fasta sequences in >> either input or output), and the longest such entry is 62225 bp. If I wrap >> the sequences at 80 characters/line -- and then clean up some inadvertantly >> wrapped definition lines that are also absurdly long -- mauve has no >> problems. Is there a maximum line length being assumed? >> > >I've never used ABySS before, but I do know that some assemblers have a >habit of placing unusual characters in the assembled contigs. >mauveAligner and progressiveMauve can handle FastA data which is not >line-wrapped, and there is no 1980's style upper limit on line length. > >That said, I suspect the culprit in your case is either inconsistent use >of End-of-Line characters in the assembly file, the presence of some >non-printing ascii or other non IUPAC nucleotide/ambiguity character, or >unicode encoding. The sequence parser in the aligner is definitely >sensitive to all three of those issues. The EOL issue can usually be >solved by running a program like dos2unix or unix2dos on the sequence >file. All but the most basic text editors will be able to change >encoding from unicode to ascii. The non-printing and non-IUPAC sequence >character issue is a bit more tricky and I don't have a good general >solution for fixing those issues, apart from requesting that the author >of software generating such files provide an option to generate the >files in a more standards-conforming way. > >Hope that helps, >-Aaron
Hi Aaron, I already played with end-of-line characters (I've been bitten before on that), as well as encoding. The sequence lines contain only G,A,T,C, and N. The def lines contain only numbers, spaces, "+" signs, "-" signs, and "," (commas). Since the error refers to gaps, I looked at "-". If I delete all the "-" or change them to "c" or to "_" (underscore) Mauve is happy. That is 628 instances in 148 lines. So it definitely sounds like a rogue def line. If I change all line-ending instances of "-" to "c" or delete the "-" the error persists, but is reported at a new position. It isn't the first line ending with a "-" and it isn't the longest one, but at that point I gave up ... all praise BBEdit, but enough is enough. The fix for now is to change all "-" to something else so they don't mistakenly get interpreted as gaps, although the fact that "_" is OK makes me wonder what made the parser think it was in the seq line instead of the def line ... *sigh* The lesson of my past week is that there is no such thing as a "standard" file format in the sequence world, just infinite variations on a theme. - Guy ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Mauve-users mailing list Mauve-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mauve-users