I assume that the downloaded file has the complete sequence in it? Probably worth checking that it has the complete sequence block (all 116366104 bp).
- Mark On Thu, Jan 29, 2009 at 12:51 PM, gang wu <[email protected]>wrote: > Hi Everyone, > > I have a piece of code to parse Genbank file and retrieve gene sequence and > related information. It works well with sequences such as Arabidopsis > thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome > 2. The contig that the code failed on is the largest one in my test. Contig > NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp. > That causes some gene coordinates out of range. Attached is the code. Can > anyone give some suggesttion? > > The Mus musculus Genbank file can be downloaded at : > ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz > > Thanks in advance > > Gang > ========================================== > public class TestMus { > public void testMusChr2() throws FileNotFoundException, > NoSuchElementException, BioException { > String fp="/tmp/mm_alt_chr2.gbk"; > System.out.println("File: " + fp); > BufferedReader gReader = new BufferedReader(new InputStreamReader(new > FileInputStream(new File(fp)))); > Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqI = > RichSequence.IOTools.readGenbankDNA(gReader, ns); > while (seqI.hasNext()) { > RichSequence seq = seqI.nextRichSequence(); > String organism = seq.getTaxon().getDisplayName(); > String accession = seq.getAccession(); > String identifier = seq.getIdentifier(); > int taxonID = seq.getTaxon().getNCBITaxID(); > String division = seq.getDivision(); > String seqVersion = "" + seq.getSeqVersion(); > int seqLength = seq.length(); > String description = seq.getDescription(); > System.out.println("Organism: " + organism > + "\nAccession: " + accession > + "\nIdentifier: " + identifier > + "\nTaxonID: " + taxonID > + "\nDivision: " + division > + "\nSeqVersion: " + seqVersion > + "\nLength: " + seqLength); > System.out.println("2041-2101: " + seq.subStr(2041, 2101)); > for (Iterator i = seq.features(); i.hasNext();) { > RichFeature f = (RichFeature) i.next(); > int rank = f.getRank(); > String fType = f.getType(); > if (fType.toLowerCase().equals("gene")) { > int startPos=f.getLocation().getMin(); > int endPos=f.getLocation().getMax(); > int geneLen=endPos-startPos+1; > String sequence=seq.subStr(startPos, endPos); > String strand = f.getStrand().getToken() + ""; > Annotation ann = (Annotation) f.getAnnotation(); > String geneIdentifier =""; > if (ann.containsProperty("locus_tag")) { > geneIdentifier=ann.getProperty("locus_tag") + ""; > } > else geneIdentifier=ann.getProperty("gene") + ""; > > String alternativeIdentifiers=""; > try { > alternativeIdentifiers= (String) > ann.getProperty("gene"); > > } catch(NoSuchElementException e) {} > String annotation=""; > System.out.println(rank + "\t" + geneIdentifier + "\t" + > alternativeIdentifiers + "\t" > + startPos + "\t" + endPos + "\t" + geneLen + > "\t" + strand); > } > } > } > } > public static void main(String [] args) throws Exception { > TestMus tm=new TestMus(); > tm.testMusChr2(); > } > } > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
