Have you tried (using the BioJavaX method) looking at the getRichAnnotation() 
method on the RichSequence that the parser returns? That is where the majority 
of the GenBank tags should show up in a kind of hash map. Things like protein, 
product are likely to be found there. Each feature (getFeatureSet() on the 
RichSequence object) also has its own annotation set for things that are 
associated with the feature rather than the main sequence. Xrefs meanwhile can 
be retrieved as getRankedCrossRefs() on each feature, whilst sequence-level 
document references (including titles, authors, etc.) are found by calling 
getRankedDocRefs().

This section of the BioJavaX docs goes into great detail where every single 
part of the Genbank file is stored in the RichSequence objects: 
http://www.biojava.org/wiki/BioJava:BioJavaXDocs#Reading_2 and 
http://www.biojava.org/wiki/BioJava:BioJavaXDocs#Writing_2

cheers,
Richard

On 27 Oct 2010, at 14:03, jc.lucky wrote:

> 
> I'm more interesting in the features (regqrding protein-ID, taxon, xref, 
> product) and retrieving information about articles (authors, title). I don't 
> look at all to the sequence data.
> My purpose is to be able to read the GenBank file to retrieve those 
> information so that I can proceed a conversion to a semantic rdf format file. 
> I'm working on a specific gene at the moment but it would be interesting to 
> extend to any GenBank file in the future.
> 
> Thanks,
> 
> Jean-Charles
> 
> 
> 
>> Message du 27/10/10 12:41
>> De : "Scooter Willis" 
>> A : "jc.lucky" 
>> Copie à : "biojava-l lists open-bio org" 
>> Objet : Re: [Biojava-l] Tr: Retrieve Information from GenBank file
>> 
>> Jean-Charles
>> 
>> I have it on my list to do a GenBank parser but haven't had the time. I
>> can't promise anything in the next couple weeks. Can you send some details
>> about what a typical use case is for your purpose? Are you trying to get the
>> sequence data or are you more interested in the features?
>> 
>> Thanks
>> 
>> Scooter
>> 
>> On Wed, Oct 27, 2010 at 4:11 AM, jc.lucky  wrote:
>> 
>>> 
>>> I tried once again with the new version of BioJava but without succeding.
>>> Any idea or suggestion?
>>> 
>>> Thanks in advance
>>> Regards,
>>> 
>>> Jean-Charles Ferrières
>>> 
>>> 
>>>> Message du 22/10/10 10:11
>>>> De : "jc.lucky"
>>>> A : [email protected]
>>>> Copie à :
>>>> Objet : [Biojava-l] Retrieve Information from GenBank file
>>>> 
>>>> 
>>>> Hi
>>>> 
>>>> I'm trying to convert a GenBank file into a rdf file. The gene of
>>> interest can be found a t : http://www.ncbi.nlm.nih.gov/protein/284794945
>>>> 
>>>> With the below code I can read the GenBank file and I manage to retrieve
>>> information and convert them in a rdf format. However I don't succeed in
>>> retrieving some information such as Title, protein or product. According to
>>> this page (http://www.biojava.org/wiki/BioJava:BioJavaXDocs#GenBan)it is
>>> possible to do so.
>>>> Please help me find what I do wrong or what should be done to achieve my
>>> goal.
>>>> 
>>>> //read the GeneBank File
>>>> public static RichSequenceIterator readFile(String input,
>>>> RichSequenceBuilderFactory seqFactory,
>>>> Namespace ns)
>>>> throws IOException, NoSuchElementException, BioException
>>>> {
>>>> ns = null;
>>>> InputStream stream = new FileInputStream(input);
>>>> BufferedReader rdfFile = new BufferedReader(new
>>> InputStreamReader(stream));
>>>> RichSequenceIterator seqs =
>>> RichSequence.IOTools.readGenbankDNA(rdfFile,ns);
>>>> return seqs;
>>>> }
>>>> 
>>>> //Retrieve information and convert them in rdf format
>>>> public void writeToRDFFile(RichSequenceIterator rsi, String output)
>>>> throws IOException, NoSuchElementException, BioException {
>>>> //create model for the ontology
>>>> OntModel model = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM,
>>> null);
>>>> OntClass parents;
>>>> String URI = "http://pbr.wur.nl/#";;
>>>> 
>>>> while(rsi.hasNext())
>>>> {
>>>> RichSequence seq = rsi.nextRichSequence();
>>>> String id = seq.getName();
>>>> parents = model.createClass(URI + id);
>>>> Set author = seq.getRankedDocRefs();//code to clean up Set&convert
>>> toString
>>>> String definition = seq.getDescription(); //code to clean up String
>>>> //Add to model
>>>> parents.addProperty(DC.description, definition);
>>>> parents.addProperty(DC.publisher, authors);
>>>> parents.addComment(taxonomy, "EN");
>>>> parents.addProperty(DC.type, organism);
>>>> //print in rdf format
>>>> model.write(out, "RDF/XML");
>>>> out.close(); }
>>>> }
>>>> 
>>>> 
>>>> Thanks,
>>>> Jean-Charles Ferrières
>>> _____________________________________________
>>>> Biojava-l mailing list - [email protected]
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> Une messagerie gratuite, garantie à vie et des services en plus, ça vous 
> tente ?
> Je crée ma boîte mail www.laposte.net
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  [email protected]
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: [email protected]
http://www.eaglegenomics.com/


_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to