needleman -wunsch is doing a global alignment, try smith waterman instead. can you file a bug report on github? https://github.com/biojava/biojava/issues
Thanks, Andreas On Tue, Nov 26, 2013 at 7:54 PM, Daniel Cameron <[email protected]>wrote: > That API does make sense but unfortunately, it crashes when I try it. > Having traced through the source & test cases from github, it appears that > there isn't actually any test coverage for the (anchor != null) path in > align(). > > Additionally, align() also explicitly forces the final base of the query > to align to the final base of the target which in my case is not supposed > to be an anchor. I've copied my test cases with expected behaviour. Is > there a bug track system I should be raising this with? > > Cheers > Daniel > > @Test > public void testNeedlemanWunschDNAAlignment() { > DNASequence query = new DNASequence("ACGTAACCGGTT", > AmbiguityDNACompoundSet.getDNACompoundSet()); > DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT", > AmbiguityDNACompoundSet.getDNACompoundSet()); > // 123456789012345678900 > // Expected: |----------| > NeedlemanWunsch<DNASequence, NucleotideCompound> nw = new > NeedlemanWunsch<DNASequence, NucleotideCompound>(query, target, new > SimpleGapPenalty((short)5, (short)2), SubstitutionMatrixHelper.getNuc4_4()); > AlignedSequence<DNASequence, NucleotideCompound> aligned = > nw.getPair().getQuery(); > assertEquals(2, (int)aligned.getStart().getPosition()); > assertEquals(13, (int)aligned.getEnd().getPosition()); > } > @Test > public void testAnchoredDNAAlignment() { > DNASequence query = new DNASequence("ACGTAACCGGTT", > AmbiguityDNACompoundSet.getDNACompoundSet()); > DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT", > AmbiguityDNACompoundSet.getDNACompoundSet()); > // 123456789012345678900 > // Expected: |_----------| > AnchoredPairwiseSequenceAligner<DNASequence, NucleotideCompound> > aligner = new AnchoredPairwiseSequenceAligner<DNASequence, > NucleotideCompound>(query, target, new SimpleGapPenalty((short)5, > (short)2), SubstitutionMatrixHelper.getNuc4_4()); > int[] anchors = new int[query.getLength()]; > for (int i = 0; i < anchors.length; i++) anchors[i] = -1; > anchors[0] = 0; > AlignedSequence<DNASequence, NucleotideCompound> aligned = > aligner.getPair().getQuery(); > assertEquals(1, (int)aligned.getStart().getPosition()); > assertEquals(13, (int)aligned.getEnd().getPosition()); > > } > > On 27/11/2013 12:00 PM, Andreas Prlic wrote: > > Hi Daniel, > > Clearly we need some documentation for this. > > Looking at the source: if you get to AbstractMatrixAligner.align() you > can see how the anchors are being used. > > I just took a brief look, so I might be wrong, but I think the int[] > array should have the length of the query sequence and each position > indicates the counterpart in the target sequence. Positions that are <=0 > should be considered not to be anchored. If this is right, then this should > be very close to what you were expecting. > > Can you take a closer look and confirm if this works for you? > > Thanks, > > Andreas > > On Sun, Nov 24, 2013 at 11:22 PM, Daniel Cameron <[email protected]>wrote: > >> Hello all >> >> I'm looking the BioJava API for anchored alignment and I'm unsure of how >> to set alignment anchors. The API exposes int[] get/setAnchor() which is >> confusing me somewhat and I'm unsure of what a BioJava anchor actually is. >> My use case is the comparison of two sequences which I have a priori >> knowledge of particular positions so I was expecting an API where I could >> specify base positions in the query to correspond different positions in >> the target. Eg: I was query base 4 to align to target base 10 and query >> base 100 to align to target base 80. Is this possible? >> >> With anchor being only an int[] and the source looking like this is not >> an array of paired values, how would one go about performing the above >> alignment? >> >> >> Thanks >> Daniel Cameron >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ >> _______________________________________________ >> Biojava-l mailing list - [email protected] >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > ______________________________________________________________________ > The information in this email is confidential and intended solely for the > addressee. > You must not disclose, forward, print or use it without the permission of > the sender. > ______________________________________________________________________ > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
