Thanks Richard. That is exactly the same issue. The latest Subversion
trunk fixed the problem.
Thanks again for the quick response.
Gang
Richard Holland wrote:
Gabrielle Doan posted a solution to this a while back and I believe the
changes have been committed already:
http://www.mail-archive.com/[email protected]/msg01036.html
How old is the copy of BioJava that you're using? Have you tried
checking out the trunk from Subversion to see if that works?
cheers,
Richard
Mark Schreiber wrote:
I assume that the downloaded file has the complete sequence in it? Probably
worth checking that it has the complete sequence block (all 116366104 bp).
- Mark
On Thu, Jan 29, 2009 at 12:51 PM, gang wu <[email protected]>wrote:
Hi Everyone,
I have a piece of code to parse Genbank file and retrieve gene sequence and
related information. It works well with sequences such as Arabidopsis
thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome
2. The contig that the code failed on is the largest one in my test. Contig
NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp.
That causes some gene coordinates out of range. Attached is the code. Can
anyone give some suggesttion?
The Mus musculus Genbank file can be downloaded at :
ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
Thanks in advance
Gang
==========================================
public class TestMus {
public void testMusChr2() throws FileNotFoundException,
NoSuchElementException, BioException {
String fp="/tmp/mm_alt_chr2.gbk";
System.out.println("File: " + fp);
BufferedReader gReader = new BufferedReader(new InputStreamReader(new
FileInputStream(new File(fp))));
Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
RichSequenceIterator seqI =
RichSequence.IOTools.readGenbankDNA(gReader, ns);
while (seqI.hasNext()) {
RichSequence seq = seqI.nextRichSequence();
String organism = seq.getTaxon().getDisplayName();
String accession = seq.getAccession();
String identifier = seq.getIdentifier();
int taxonID = seq.getTaxon().getNCBITaxID();
String division = seq.getDivision();
String seqVersion = "" + seq.getSeqVersion();
int seqLength = seq.length();
String description = seq.getDescription();
System.out.println("Organism: " + organism
+ "\nAccession: " + accession
+ "\nIdentifier: " + identifier
+ "\nTaxonID: " + taxonID
+ "\nDivision: " + division
+ "\nSeqVersion: " + seqVersion
+ "\nLength: " + seqLength);
System.out.println("2041-2101: " + seq.subStr(2041, 2101));
for (Iterator i = seq.features(); i.hasNext();) {
RichFeature f = (RichFeature) i.next();
int rank = f.getRank();
String fType = f.getType();
if (fType.toLowerCase().equals("gene")) {
int startPos=f.getLocation().getMin();
int endPos=f.getLocation().getMax();
int geneLen=endPos-startPos+1;
String sequence=seq.subStr(startPos, endPos);
String strand = f.getStrand().getToken() + "";
Annotation ann = (Annotation) f.getAnnotation();
String geneIdentifier ="";
if (ann.containsProperty("locus_tag")) {
geneIdentifier=ann.getProperty("locus_tag") + "";
}
else geneIdentifier=ann.getProperty("gene") + "";
String alternativeIdentifiers="";
try {
alternativeIdentifiers= (String)
ann.getProperty("gene");
} catch(NoSuchElementException e) {}
String annotation="";
System.out.println(rank + "\t" + geneIdentifier + "\t" +
alternativeIdentifiers + "\t"
+ startPos + "\t" + endPos + "\t" + geneLen +
"\t" + strand);
}
}
}
}
public static void main(String [] args) throws Exception {
TestMus tm=new TestMus();
tm.testMusChr2();
}
}
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l