That API does make sense but unfortunately, it crashes when I try it.
Having traced through the source & test cases from github, it appears
that there isn't actually any test coverage for the (anchor != null)
path in align().
Additionally, align() also explicitly forces the final base of the query
to align to the final base of the target which in my case is not
supposed to be an anchor. I've copied my test cases with expected
behaviour. Is there a bug track system I should be raising this with?
Cheers
Daniel
@Test
public void testNeedlemanWunschDNAAlignment() {
DNASequence query = new DNASequence("ACGTAACCGGTT",
AmbiguityDNACompoundSet.getDNACompoundSet());
DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT",
AmbiguityDNACompoundSet.getDNACompoundSet());
// 123456789012345678900
// Expected: |----------|
NeedlemanWunsch<DNASequence, NucleotideCompound> nw = new
NeedlemanWunsch<DNASequence, NucleotideCompound>(query, target, new
SimpleGapPenalty((short)5, (short)2), SubstitutionMatrixHelper.getNuc4_4());
AlignedSequence<DNASequence, NucleotideCompound> aligned =
nw.getPair().getQuery();
assertEquals(2, (int)aligned.getStart().getPosition());
assertEquals(13, (int)aligned.getEnd().getPosition());
}
@Test
public void testAnchoredDNAAlignment() {
DNASequence query = new DNASequence("ACGTAACCGGTT",
AmbiguityDNACompoundSet.getDNACompoundSet());
DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT",
AmbiguityDNACompoundSet.getDNACompoundSet());
// 123456789012345678900
// Expected: |_----------|
AnchoredPairwiseSequenceAligner<DNASequence,
NucleotideCompound> aligner = new
AnchoredPairwiseSequenceAligner<DNASequence, NucleotideCompound>(query,
target, new SimpleGapPenalty((short)5, (short)2),
SubstitutionMatrixHelper.getNuc4_4());
int[] anchors = new int[query.getLength()];
for (int i = 0; i < anchors.length; i++) anchors[i] = -1;
anchors[0] = 0;
AlignedSequence<DNASequence, NucleotideCompound> aligned =
aligner.getPair().getQuery();
assertEquals(1, (int)aligned.getStart().getPosition());
assertEquals(13, (int)aligned.getEnd().getPosition());
}
On 27/11/2013 12:00 PM, Andreas Prlic wrote:
Hi Daniel,
Clearly we need some documentation for this.
Looking at the source: if you get to AbstractMatrixAligner.align() you
can see how the anchors are being used.
I just took a brief look, so I might be wrong, but I think the int[]
array should have the length of the query sequence and each position
indicates the counterpart in the target sequence. Positions that are
<=0 should be considered not to be anchored. If this is right, then
this should be very close to what you were expecting.
Can you take a closer look and confirm if this works for you?
Thanks,
Andreas
On Sun, Nov 24, 2013 at 11:22 PM, Daniel Cameron
<[email protected] <mailto:[email protected]>> wrote:
Hello all
I'm looking the BioJava API for anchored alignment and I'm unsure
of how to set alignment anchors. The API exposes int[]
get/setAnchor() which is confusing me somewhat and I'm unsure of
what a BioJava anchor actually is. My use case is the comparison
of two sequences which I have a priori knowledge of particular
positions so I was expecting an API where I could specify base
positions in the query to correspond different positions in the
target. Eg: I was query base 4 to align to target base 10 and
query base 100 to align to target base 80. Is this possible?
With anchor being only an int[] and the source looking like this
is not an array of paired values, how would one go about
performing the above alignment?
Thanks
Daniel Cameron
______________________________________________________________________
The information in this email is confidential and intended solely
for the addressee.
You must not disclose, forward, print or use it without the
permission of the sender.
______________________________________________________________________
_______________________________________________
Biojava-l mailing list - [email protected]
<mailto:[email protected]>
http://lists.open-bio.org/mailman/listinfo/biojava-l
______________________________________________________________________
The information in this email is confidential and intended solely for the
addressee.
You must not disclose, forward, print or use it without the permission of the
sender.
______________________________________________________________________
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l