That API does make sense but unfortunately, it crashes when I try it. Having traced through the source & test cases from github, it appears that there isn't actually any test coverage for the (anchor != null) path in align().

Additionally, align() also explicitly forces the final base of the query to align to the final base of the target which in my case is not supposed to be an anchor. I've copied my test cases with expected behaviour. Is there a bug track system I should be raising this with?

Cheers
Daniel

    @Test
    public void testNeedlemanWunschDNAAlignment() {
DNASequence query = new DNASequence("ACGTAACCGGTT", AmbiguityDNACompoundSet.getDNACompoundSet()); DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT", AmbiguityDNACompoundSet.getDNACompoundSet());
        // 123456789012345678900
        // Expected:                           |----------|
NeedlemanWunsch<DNASequence, NucleotideCompound> nw = new NeedlemanWunsch<DNASequence, NucleotideCompound>(query, target, new SimpleGapPenalty((short)5, (short)2), SubstitutionMatrixHelper.getNuc4_4()); AlignedSequence<DNASequence, NucleotideCompound> aligned = nw.getPair().getQuery();
        assertEquals(2, (int)aligned.getStart().getPosition());
        assertEquals(13, (int)aligned.getEnd().getPosition());
    }
    @Test
    public void testAnchoredDNAAlignment() {
DNASequence query = new DNASequence("ACGTAACCGGTT", AmbiguityDNACompoundSet.getDNACompoundSet()); DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT", AmbiguityDNACompoundSet.getDNACompoundSet());
        // 123456789012345678900
        // Expected:                          |_----------|
AnchoredPairwiseSequenceAligner<DNASequence, NucleotideCompound> aligner = new AnchoredPairwiseSequenceAligner<DNASequence, NucleotideCompound>(query, target, new SimpleGapPenalty((short)5, (short)2), SubstitutionMatrixHelper.getNuc4_4());
        int[] anchors = new int[query.getLength()];
        for (int i = 0; i < anchors.length; i++) anchors[i] = -1;
        anchors[0] = 0;
AlignedSequence<DNASequence, NucleotideCompound> aligned = aligner.getPair().getQuery();
        assertEquals(1, (int)aligned.getStart().getPosition());
        assertEquals(13, (int)aligned.getEnd().getPosition());
    }

On 27/11/2013 12:00 PM, Andreas Prlic wrote:
Hi Daniel,

Clearly we need some documentation for this.

Looking at the source: if you get to AbstractMatrixAligner.align() you can see how the anchors are being used.

I just took a brief look, so I might be wrong, but I think the int[] array should have the length of the query sequence and each position indicates the counterpart in the target sequence. Positions that are <=0 should be considered not to be anchored. If this is right, then this should be very close to what you were expecting.

Can you take a closer look and confirm if this works for you?

Thanks,

Andreas

On Sun, Nov 24, 2013 at 11:22 PM, Daniel Cameron <[email protected] <mailto:[email protected]>> wrote:

    Hello all

    I'm looking the BioJava API for anchored alignment and I'm unsure
    of how to set alignment anchors. The API exposes int[]
    get/setAnchor() which is confusing me somewhat and I'm unsure of
    what a BioJava anchor actually is. My use case is the comparison
    of two sequences which I have a priori knowledge of particular
    positions so I was expecting an API where I could specify base
    positions in the query to correspond different positions in the
    target. Eg: I was query base 4 to align to target base 10 and
    query base 100 to align to target base 80. Is this possible?

    With anchor being only an int[] and the source looking like this
    is not an array of paired values, how would one go about
    performing the above alignment?


    Thanks
    Daniel Cameron

    ______________________________________________________________________
    The information in this email is confidential and intended solely
    for the addressee.
    You must not disclose, forward, print or use it without the
    permission of the sender.
    ______________________________________________________________________
    _______________________________________________
    Biojava-l mailing list  - [email protected]
    <mailto:[email protected]>
    http://lists.open-bio.org/mailman/listinfo/biojava-l



______________________________________________________________________
The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.
______________________________________________________________________
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to