On 8/19/10, Aaron Darling wrote:
>Hi Guy,
>
>On Wed, 2010-08-18 at 16:57 -0500, Guy Plunkett III wrote:
>> I've got some contigs from an assembly using ABySS that I want to align with 
>> a related genome. If I try running mauve I get the following:
>>
>> OS name is: Mac OS X arch: x86_64
>> Executing:
>> /Applications/Mauve.app/Contents/MacOS/progressiveMauve --output=ABySS test 
>> --output-guide-tree=ABySS test.guide_tree --backbone-output=ABySS 
>> test.backbone /Users/guy/CP001918-20.gbk /Users/guy/Ecl13047-contigs.fa
>> Storing raw sequence at 
>> /var/folders/22/22Fli5j6G6OGbn0txlgpGk+++TI/-Tmp-/rawseq3797.000
>> Sequence loaded successfully.
>> /Users/guy/CP001918-20.gbk 5598796 base pairs.
>> Storing raw sequence at 
>> /var/folders/22/22Fli5j6G6OGbn0txlgpGk+++TI/-Tmp-/rawseq3797.001
>> Sequence loaded successfully.
>> /Users/guy/Ecl13047-contigs.fa 6221099 base pairs.
>> Using weight 15 mers for initial seedsERROR! gap character encountered at 
>> genome sequence position 2903159
>>
>> Creating sorted mer list
>> Create time was: 2 seconds.
>> Creating sorted mer list
>> Input sequences must be unaligned and ungapped!
>> Caught signal 11
>> Cleaning up and exiting!
>> Temporary files deleted.
>> Exited with error code: 11
>>
>>
>> The file "Ecl13047-contigs.fa" seems to be the culprit, but I can find no 
>> internal gap characters in any of the 4887 contigs. However, the sequences 
>> in the fasta are unwrapped (ABySS doesn't support wrapped fasta sequences in 
>> either input or output), and the longest such entry is 62225 bp. If I wrap 
>> the sequences at 80 characters/line -- and then clean up some inadvertantly 
>> wrapped definition lines that are also absurdly long -- mauve has no 
>> problems. Is there a maximum line length being assumed?
>>
>
>I've never used ABySS before, but I do know that some assemblers have a
>habit of placing unusual characters in the assembled contigs.
>mauveAligner and progressiveMauve can handle FastA data which is not
>line-wrapped, and there is no 1980's style upper limit on line length.
>
>That said, I suspect the culprit in your case is either inconsistent use
>of End-of-Line characters in the assembly file, the presence of some
>non-printing ascii or other non IUPAC nucleotide/ambiguity character, or
>unicode encoding.  The sequence parser in the aligner is definitely
>sensitive to all three of those issues.  The EOL issue can usually be
>solved by running a program like dos2unix or unix2dos on the sequence
>file.  All but the most basic text editors will be able to change
>encoding from unicode to ascii.  The non-printing and non-IUPAC sequence
>character issue is a bit more tricky and I don't have a good general
>solution for fixing those issues, apart from requesting that the author
>of software generating such files provide an option to generate the
>files in a more standards-conforming way.
>
>Hope that helps,
>-Aaron

Hi Aaron,

I already played with end-of-line characters (I've been bitten before on that),
as well as encoding. The sequence lines contain only G,A,T,C, and N. The def
lines contain only numbers, spaces, "+" signs, "-" signs, and "," (commas).

Since the error refers to gaps, I looked at "-". If I delete all the "-" or 
change
them to "c" or to "_" (underscore) Mauve is happy. That is 628 instances in
148 lines. So it definitely sounds like a rogue def line.

If I change all line-ending instances of "-" to "c" or delete the "-" the error
persists, but is reported at a new position. It isn't the first line ending 
with a
"-" and it isn't the longest one, but at that point I gave up ... all praise
BBEdit, but enough is enough. The fix for now is to change all "-" to something
else so they don't mistakenly get interpreted as gaps, although the fact that 
"_"
is OK makes me wonder what made the parser think it was in the seq line
instead of the def line ...

*sigh* The lesson of my past week is that there is no such thing as a "standard"
file format in the sequence world, just infinite variations on a theme.

- Guy


------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users

Reply via email to