Can't you find those information in the "source" feature? Check this list:
List l = sequence.getFeaturesByType("source");This come from the fact that in new version of genbank file, source is a compulsory feature and they move many info from top level "Features tag" into "Source" tag qualifiers. Let us know, Paolo 2015-06-03 14:29 GMT+02:00 simon rayner <[email protected]>: > Thanks to all for taking the time to answer. > > I had already got as far as parsing out the feature information using > something like > > LinkedHashMap<String, DNASequence> dnaSequences = > GenbankReaderHelper.readGenbankDNASequence( dnaFile ); > for (DNASequence sequence : dnaSequences.values()) { > > > List<FeatureInterface<AbstractSequence<NucleotideCompound>, > NucleotideCompound>> fl = sequence.getFeatures(); > for (FeatureInterface fi : fl) { > > HashMap <String, Qualifier> quals = fi.getQualifiers(); > for(Map.Entry<String, Qualifier> entry : > quals.entrySet()){ > logger.info("--\t" + entry.getKey() + "\t|\t" + > entry.getValue().getName() > + " / " + entry.getValue().getValue() + > "\\" + entry.getValue().toString()); > } > logger.info("SHORT\t" + fi.getShortDescription()); > logger.info("SOURCE\t" + fi.getSource()); > logger.info("TYPE\t" + fi.getType()); > logger.info("HASHCODE\t" + fi.hashCode()); > logger.info("-"); > } > > } > > But I am still stumped as to how to access the annotation information at > the top of a GenBank file. > > For example, getAccession gets me the accession number of the sequence, > but what about all the other data that is there (e.g. the pubmed records)? > > In BJ3, there was a RichAnnotation class, but I don't see anything > equivalent in BJ4. > > cheers > > Simon > > > > On Wed, Jun 3, 2015 at 12:39 PM, Paolo Pavan <[email protected]> > wrote: > >> Hi Simon, >> I took care about last updates to the Genbank parser (reader). At the >> state of the art, there are two ways to read annotated Genbank files: via >> GenbankReader and via GenbankProxySequenceReader . >> >> The first one: >> GenbankReader<ProteinSequence, AminoAcidCompound> GenbankProtein >> = new GenbankReader<ProteinSequence, AminoAcidCompound>( >> inStream, >> new GenericGenbankHeaderParser<ProteinSequence, >> AminoAcidCompound>(), >> new >> ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()) >> ); >> LinkedHashMap<String, ProteinSequence> proteinSequences = >> GenbankProtein.process(); >> inStream.close(); >> >> >> The second one is: >> >> GenbankProxySequenceReader<AminoAcidCompound> genbankProteinReader >> = new >> GenbankProxySequenceReader<AminoAcidCompound>("/my_directory", "NP_000257", >> AminoAcidCompoundSet.getAminoAcidCompoundSet()); >> ProteinSequence proteinSequence = new >> ProteinSequence(genbankProteinReader); >> >> >> Just keep in mind to use NucleotideCompound and a >> DNASequenceCreator(DNACompoundSet.getDNACompoundSet()) if you need to parse >> genbank nucleotide files. >> >> You can access annotation stored via getFeatures() methods family of the >> readed sequence object. Also note that features have qualifiers (those >> starting with / in the genbank file) and they must be accessed from the >> feature object with getQualifiers(). >> Also note that feature can have complex locations (rare, but present) in >> this case you will find nested locations in the feature retrieved. >> >> Does this answer your question? >> Bye bye, >> Paolo >> >> >> >> >> >> >> 2015-06-03 10:27 GMT+02:00 Jose Manuel Duarte <[email protected]>: >> >>> I can't offer much help regarding GenBank parsing itself, but I would at >>> least like to clarify the situation with the different (indeed confusing) >>> versions: >>> >>> BJ4 is the current release, well maintained and under development. BJ3 >>> has been completely superseded by BJ4. That means that BJ4 does everything >>> that BJ3 did. In the cookbook and tutorials everything that refers to BJ3 >>> should work in BJ4, with the only difference that the namespace of packages >>> has changed from org.biojava.bio/org.biojava3 to org.biojava.nbio. >>> >>> BJ1 and BJX are both legacy projects, with some maintenance but not much >>> active development. I believe that some of the features in them were not >>> ported to BJ3+. >>> >>> Cheers >>> >>> Jose >>> >>> >>> >>> On 02.06.2015 11:40, Simon Rayner wrote: >>> >>>> Hi >>>> >>>> I'm coming back to BioJava (BJ) after a couple of years away and am >>>> somewhat confused by the current collection of cookbooks, tutorials and >>>> APIs. There appear to be a few examples for handling protein structure >>>> data, but relatively little for more mainstream stuff such as parsing >>>> Genbank files, which I first need to get the information I want to >>>> investigate protein structure. But when I look at the relevant code samples >>>> to do this, they refer back to BJ3, BJ1, or even BJX. Even the Wiki page >>>> still refers to BJ3 despite the release of BJ4 back in Feb 2015. >>>> >>>> I have everything working for parsing GenBank data, but I'm still >>>> trying to get the Annotation information out of the top of a GenBank file, >>>> and can't find any way of doing this using BJ4 - the BJ4 API appears to >>>> refer to the RichAnnotation type in BJX release. Can anyone clarify what >>>> you are supposed to do here? Start mixing in some BJX? (and is BJX still >>>> active?) or should I still be using BJ3 until BJ4 stabilizes. I realise >>>> this is an open source project, but some clarification on the current >>>> status of things would be handy if the project is going to appeal to a >>>> larger community :) >>>> >>>> Thanks! >>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - [email protected] >>>> http://mailman.open-bio.org/mailman/listinfo/biojava-l >>>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - [email protected] >>> http://mailman.open-bio.org/mailman/listinfo/biojava-l >>> >> >> >> _______________________________________________ >> Biojava-l mailing list - [email protected] >> http://mailman.open-bio.org/mailman/listinfo/biojava-l >> > >
_______________________________________________ Biojava-l mailing list - [email protected] http://mailman.open-bio.org/mailman/listinfo/biojava-l
