Their is problem with GenbankLocationParser class, this class don't process genbank record with Accession: M32882. LocationParser class fails at following line in genbank record:

     gene  </nuccore/150738?itemid=33&report=gbwithparts>             
join((8298.8300)..10206,1..855)
                     /gene="bcn"
     mRNA  </nuccore/150738?itemid=15&report=gbwithparts>             
join((8298.8300)..10206,1..855)
                     /gene="bcn"
                     /note="alternative transcript"


Exception stack trace is as follows:

        Could not understand position: 10206,1..855
        org.biojava.bio.seq.io.ParseException: Could not understand position: 
10206,1..855
        at 
org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285)
        at 
org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285)
        at 
org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:277)
        at 
org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:244)
        at 
org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131)

I did some investigation in following matter, and found the defect in regular expression named as "gp" in GenbankLocationParser class.

This error can be fixed by applying attached patch. And then for testing I have created a method which proves that it can now understand all the possible combination of location. This test class is also attached so that you can test my patch before and after its application.

I don't have access to svn so please apply this patch for me, and let me know if you approve this patch or not.

Thanks
Deepak Sheoran

Index: GenbankLocationParser.java
===================================================================
--- GenbankLocationParser.java  (revision 8212)
+++ GenbankLocationParser.java  (working copy)
@@ -133,7 +133,7 @@
     
     // O beautiful regex, we worship you.
     // this matches grouped locations
-    private static Pattern gp = 
Pattern.compile("^([^\\(\\):]*?:)?(complement|join|order)?\\(*{0,1}(.*?)\\)*{0,1}$");
+    private static Pattern gp = 
Pattern.compile("^([^\\(\\):]*?:)?(complement|join|order)?\\({0,1}(.*?\\)*{0,1})$");
     // this matches range locations
     private static Pattern rp = 
Pattern.compile("^\\(*(.*?)\\)*(\\.\\.\\(*(.*)\\)*)?$");
     // this matches accession/version pairs
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichLocation;
import org.junit.Test;

/**
 * This class will help test if GenbankLocationParser class can understand all
 * location or not.
 * @author dsheoran
 */
public class LocationParserTest {

    @Test
    public void testParseLocation() throws Exception {
        String[] location = new String[]{
            "467",
            "340..565",
            "<345..500",
            "<1..888",
            "(102.110)",
            "(23.45)..600",
            "(122.133)..(204.221)",
            "123^124",
            "145^177",
            "join(12..78,134..202)",
            "complement(1..23)",
            "complement(join(2691..4571,4918..5163)",
            "join(complement(4918..5163),complement(2691..4571))",
            "complement(34..(122.126))",
            "complement((122.126)..34)",
            "J00194:100..202",
            "(8298.8300)..10206",
            "join((8298.8300)..10206,1..855)"};
// join((8298.8300)..10206,1..855) new type of location found in genbank record
// with accession : M32882
        for (String loc : location) {
            RichLocation parseLocation =
                    
org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(new 
SimpleNamespace("gb"), "Ad3232", loc);
            
System.out.println(org.biojavax.bio.seq.io.GenbankLocationParser.writeLocation(parseLocation));
        }
    }
}

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to