Hi Jan,
That is a bug in needle.
What happens is that just before the end of your alignment there are two
ways to get a high score. It picks a match, and fails to notice that
extending an earlier gap is cheaper than starting a new gap.
That means one of our optimisations is causing a problem and will need
to allow a wider search for scores when considering starting a new gap.
Many thanks for finding it.
Peter Rice
EMBOSS Team
On 06/12/2018 18:13, Jan T Kim wrote:
Hi All,
I've coded up a pairwise semiglobal aligner and then checked it by
comparing its output to that of needle. My expectation was to find
problems in my code, but now, to my surprise, I found a case where
the pairwise alignment computed by my code has a higher score than
that returned by needle. The sequences are:
x1a
CGTGATACATTACTTTTTA
x1b
GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
Storing these in separate files x1a.fasta and x1b.fasta and running
needle -auto x1a.fasta x1b.fasta x1.txt
gives this alignment with score 21.5:
x1a
------------CGTGAT--ACATTACTTTTTA--------------------
x1b
GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
However, my aligner finds this with score 22.0:
x1a x1a, aligned semiglobally, score 22.000000
------------CGTGA-------TACA--TTACTTTTTA-----------------
x1b x1b, aligned semiglobally, score 22.000000
GTGGACTTGACGCGTCATGGAAAGTACAAGATACTT----CGACCTGGCAGTGCAAG
The Needleman-Wunsch algorithm computes an optimal alignment, i.e.
one with maximal score, given the input sequences. Therefore, if my
score computation is correct, the alignment reported by needle is
not optimal, which would mean that there's a bug in needle.
Of course it's also possible that I'm doing something wrong / stupid,
so I append position by position breakdowns of my score calculations.
I also attach the full needle output and my fasta files, so you can
run the above command to check whether you get the same result. I've
found this with Ubuntu xenial / emboss6.6.0+dfsg-3build1, and confirmed
that I get the same with Ubuntu bionic / emboss 6.6.0+dfsg-6 (both amd64).
The needle output file also confirms that needle computes the score of
its alignment as 21.5, consistently with my scoring.
I notice that it's a somewhat peculiar feature of my alignment that
the internal gap of length 4 in x1b ends where the terminal (unpenalised)
gap in x1a starts. I'm not aware of any criterion / rule violated by that,
though.
Best regards & thanks in advance for any scrutiny and comments, Jan
----- 8< --- breakdown of needle alignment scoring -------
+0.0 # -~G
+0.0 # -~T
+0.0 # -~G
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~T
+0.0 # -~T
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~G
+5.0 # C-C
+5.0 # G-G
+5.0 # T-T
-4.0 # G.C
+5.0 # A-A
+5.0 # T-T
-10.0 # - G
-0.5 # - G
+5.0 # A-A
-4.0 # C.A
+5.0 # A-A
-4.0 # T.G
+5.0 # T-T
+5.0 # A-A
+5.0 # C-C
-4.0 # T.A
-4.0 # T.A
-4.0 # T.G
-4.0 # T.A
+5.0 # T-T
+5.0 # A-A
+0.0 # -~C
+0.0 # -~T
+0.0 # -~T
+0.0 # -~C
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~C
+0.0 # -~T
+0.0 # -~G
+0.0 # -~G
+0.0 # -~C
+0.0 # -~A
+0.0 # -~G
+0.0 # -~T
+0.0 # -~G
+0.0 # -~C
+0.0 # -~A
+0.0 # -~A
+0.0 # -~G
+21.5
----- 8< --- breakdown of scoring of "my" alignment -------
+0.0 # -~G
+0.0 # -~T
+0.0 # -~G
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~T
+0.0 # -~T
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~G
+5.0 # C-C
+5.0 # G-G
+5.0 # T-T
-4.0 # G.C
+5.0 # A-A
-10.0 # - T
-0.5 # - G
-0.5 # - G
-0.5 # - A
-0.5 # - A
-0.5 # - A
-0.5 # - G
+5.0 # T-T
+5.0 # A-A
+5.0 # C-C
+5.0 # A-A
-10.0 # - A
-0.5 # - G
-4.0 # T.A
+5.0 # T-T
+5.0 # A-A
+5.0 # C-C
+5.0 # T-T
+5.0 # T-T
-10.0 # T -
-0.5 # T -
-0.5 # T -
-0.5 # A -
+0.0 # -~C
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~C
+0.0 # -~T
+0.0 # -~G
+0.0 # -~G
+0.0 # -~C
+0.0 # -~A
+0.0 # -~G
+0.0 # -~T
+0.0 # -~G
+0.0 # -~C
+0.0 # -~A
+0.0 # -~A
+0.0 # -~G
+22.0
_______________________________________________
EMBOSS mailing list
[email protected]
http://mailman.open-bio.org/mailman/listinfo/emboss
---
This email has been checked for viruses by AVG.
https://www.avg.com
_______________________________________________
EMBOSS mailing list
[email protected]
http://mailman.open-bio.org/mailman/listinfo/emboss