Changing the gap penalty isn't making a difference because both versions
have the same number of gaps and gaps of the same length. Penalizing
end gaps might address the first example, but not the second.
Since the gaps are the same (from the point of view of how gaps are
scored by the algorithms), what is actually driving the output is the
substitution penalties. In the PSA example, the preferred alignment has
an 'R' substituted for a 'G', whereas the unwanted output has 'R'
substituted for 'S'. The latter is more common substitution since it
is more conservative from the point of view of amino acid chemistry and
may also require fewer mutations (although that depends on the codon
usage for both 'R' and 'S'). Thus it will get a lower penalty, so most
algorithms will prefer the unwanted PSA over your expected output.
A similar reasoning applies to the MSA example. In the unwanted
version, it is matching 'G' to 'G', which is not a substitution at all
and thus gets a higher score than the 'V' to 'G' substitution required
for the expected output.
Now, I can understand why, in the PSA example an end gap seems more
likely than an internal gap, and in the MSA example one deletion event
seems more likely than two similar but slightly different deletion
events. But the math of the traditional alignment algorithms just won't
support those outputs.
Unfortunately, I don't have a good answer for how to make BioJava output
your desired result. But it is my hope that clarifying the problem
might be a useful step in arriving at a solution.
Incidentally, does your desired output come directly from a particular
alignment algorithm, or have they been hand-adjusted?
-Andy Walsh
On 1/14/2011 10:45 AM, Andreas Prlic wrote:
looks a bit like an end-gap issue to me. I think the global alignment
algorithm does not penalize end gaps. Try a local alignment (smith
waterman) instead.
Andreas
On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari
<[email protected]> wrote:
Hi All,
I am testing the PSA and MSA examples from Cookbook3.
Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex.
below:
EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS
EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R
expected PSA was:
EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS
EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR-------------------
the same for MSA
DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT
EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR-----------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR-----------------
QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
expected MSA
DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT
EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR-----------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR-----------------
QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
I have tested different gop/gep and LOCAL/GLOBAL PSA . No success!
How can I force or avoid the gap creation at specific positions?
Many thanks.
Khalil
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l