Changing the gap penalty isn't making a difference because both versions have the same number of gaps and gaps of the same length. Penalizing end gaps might address the first example, but not the second.

Since the gaps are the same (from the point of view of how gaps are scored by the algorithms), what is actually driving the output is the substitution penalties. In the PSA example, the preferred alignment has an 'R' substituted for a 'G', whereas the unwanted output has 'R' substituted for 'S'. The latter is more common substitution since it is more conservative from the point of view of amino acid chemistry and may also require fewer mutations (although that depends on the codon usage for both 'R' and 'S'). Thus it will get a lower penalty, so most algorithms will prefer the unwanted PSA over your expected output.

A similar reasoning applies to the MSA example. In the unwanted version, it is matching 'G' to 'G', which is not a substitution at all and thus gets a higher score than the 'V' to 'G' substitution required for the expected output.

Now, I can understand why, in the PSA example an end gap seems more likely than an internal gap, and in the MSA example one deletion event seems more likely than two similar but slightly different deletion events. But the math of the traditional alignment algorithms just won't support those outputs.

Unfortunately, I don't have a good answer for how to make BioJava output your desired result. But it is my hope that clarifying the problem might be a useful step in arriving at a solution.

Incidentally, does your desired output come directly from a particular alignment algorithm, or have they been hand-adjusted?

-Andy Walsh


On 1/14/2011 10:45 AM, Andreas Prlic wrote:
looks a bit like an end-gap issue to me. I think the global alignment
algorithm does not penalize end gaps. Try a local alignment (smith
waterman) instead.

Andreas



On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari
<[email protected]>  wrote:
Hi All,

I am testing the PSA and MSA examples from Cookbook3.

Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. 
below:

EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS
EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R

expected PSA was:
EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS
EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR-------------------


the same for MSA
DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT
EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR-----------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR-----------------
QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------

expected MSA
DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT
EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR-----------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR-----------------
QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------


I have tested different gop/gep and LOCAL/GLOBAL PSA . No success!

How can I force or avoid the gap creation at specific positions?

Many thanks.

Khalil
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to