Hi all, I've just updated my Mac to EMBOSS 6.1.0, and have found an issue with seqret conversion of IntelliGenetics files. After some digging, I think this problem relates to having DOS new lines in a file on Unix (in my case, Mac OS X).
For illustration, I'm using the example file from the EMBOSS website, saved to disk (using Unix new lines on a Mac): http://emboss.sourceforge.net/docs/themes/seqformats/ig Using EMBOSS 6.0.1, there was a problem: $ embossversion Writes the current EMBOSS version number to a file 6.0.1 $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter >HSFAU ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaaH-sapiensfaugenebasesH SFAUctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcga aaacggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgatta acactgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacag ccgtagcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacat ggtagctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgc cccgtcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggag ctaggactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgt gacacgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccat cttcgcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaaggg cttgtagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgc tccgtggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgt gagccgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatc tcctttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcg ccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccagg aaacggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtg ctcttcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcat gtagcctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgccc ctggaggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagta gcaggccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgt ctagtgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagta cttctcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacaca gacgtccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatccta gtctggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctata aattagaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaac tttgttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagagg ggttctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacag gtaaagtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtga gtgagagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtc cctgggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatg ctaggtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaac aggagaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgct ttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtct tttgtaattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttca tctttacttgcaaggcctcagggagaggtgtgcttctcgg i.e. The two sequences have been munged into one, with the name of the second sequence as part of the sequence. Using EMBOSS 6.1.0, the following now works: $ embossversion Reports the current EMBOSS version number 6.1.0 $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter >HSFAU H.sapiens fau mRNA, 518 bases ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa >HSFAU1 H.sapiens fau 1 gene, 2016 bases ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt tacttgcaaggcctcagggagaggtgtgcttctcgg i.e. There was a problem with this example file in EMBOSS 6.0.1, but things look fine in EMBOSS 6.1.0. Great :) However, if we now convert this input file to use DOS/Windows newlines, and repeat the test (on Mac OS X, so Unix): $ embossversionReports the current EMBOSS version number 6.1.0 $ seqret -sequence emboss_ig.txt -sformat ig -osformat fasta -auto -filter H.sapiens fau mRNA, 518 bases ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa H.sapiens fau 1 gene, 2016 bases ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaac ggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacac tgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgt agcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggta gctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctag gactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgaca cgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttc gcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttg tagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg tggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagc cgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcct ttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaa tatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaac ggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtag cctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctgg aggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcag gccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctag tgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacg tccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtct ggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaatt agaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttg ttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaa agtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtga gagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctg ggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctag gtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacagga gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgt caacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttg taattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctt tacttgcaaggcctcagggagaggtgtgcttctcgg i.e. The ">" is missing on all the FASTA sequences. So, it looks like EMBOSS 6.1.0 fixed one problem with IntelliGenetics files, but that there is still an issue here. Peter C. P.S. Should I have reported this possible bug via sourceforge? P.P.S. Back in 2006, I reported a similar issue with a data corruption reading stockholm/pfam with DOS newlines (Sourceforge Bug #1588956, long since fixed). It seems to me that EMBOSS would benefit from explicit testing of all the file formats using DOS/Windows newlines when run on Unix, and vice versa. Does that sound feasible, or just hopelessly ambitious? _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
