Thnx Stephen. I will try this code segment and see the difference. ________________________________________ From: Stephen Allen [sal...@apache.org] Sent: Sunday, April 22, 2012 10:38 PM To: jena-users@incubator.apache.org Subject: Re: processing .rdf files for specific property types only
I'm not sure I understand your question. The code I posted will read the file in a single pass, and filter it down to only statements that contain the owl:sameAs resource in the predicate position. This is about the fastest way you can parse your RDF. It will also use a lot less memory than storing it in an in-memory model, as it works in a streaming fashion. Also, if you don't need RDFS inferencing don't include it as it adds overhead. Try it out with your code, and see what the performance difference is. As a side note, the comparison in your if statement will be a little slower than mine since you are using String.contains(), and potentially incorrect if some other predicate had the string "owl#sameAs" in it, but wasn't the full "http://www.w3.org/2002/07/owl#sameAs". -Stephen On Sun, Apr 22, 2012 at 7:19 PM, Gunaratna, Dalkandura Arachchige Kalpa Shashika Silva <gunaratn...@wright.edu> wrote: > Hi Stephen, > Will it increase the efficiency (speed) in processing? In you code, > > if (OWL.sameAs.asNode().equals(t.getPredicate())) > { > // You can either do something immediately with this > triple, or stick it a HashSet to enforce uniqueness > sameAsTriples.add(t); > } > > you compare every statement in the model by reading each line in the file as > I tried to do earlier like follows, > > String predicate = st.getPredicate().getURI().toLowerCase(); > if(predicate.contains("owl#sameas")) > { > do something to get the list of sameAs links > } > > Thank you. > > ________________________________________ > From: Stephen Allen [sal...@apache.org] > Sent: Sunday, April 22, 2012 10:05 PM > To: jena-users@incubator.apache.org > Subject: Re: processing .rdf files for specific property types only > > On Sun, Apr 22, 2012 at 6:17 PM, Gunaratna, Dalkandura Arachchige > Kalpa Shashika Silva <gunaratn...@wright.edu> wrote: >> Hi, >> I have a simple requirement and that is to read >> <http://www.w3.org/2002/07/owl#sameAs> object values (sameAs link value) in >> a rdf file. For that I create an ontology model and read the whole file. >> Following is a code sample I sue for that. >> >> model=ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM); >> SysRIOT.wireIntoJena() ; >> model.read(url); >> StmtIterator stmtItr=model.listStatements(); >> >> This way of processing has a huge processing overhead for my program since >> for every rdf file I just need to read the whole file to get sameAs links. >> Is there any other way of doing this kind of work or only possible way is to >> read the whole file to get the specific property type we want? >> >> And also, what happens if we do not call model.close() at the end? Will it >> be a problem which will cause heap out of space problem? >> >> Thank you, >> Kalpa > > Hi Kalpa, > > If you are simply interested in parsing an RDF file in a streaming > fashion, you can do something like below. If you know that you don't > have any duplicate triples, then you can eliminate the HashSet. > > final Set<Triple> sameAsTriples = new HashSet<Triple>(); > Sink<Triple> sink = new Sink<Triple>() > { > @Override > public void send(Triple t) > { > if (OWL.sameAs.asNode().equals(t.getPredicate())) > { > // You can either do something immediately with this > triple, or stick it a HashSet to enforce uniqueness > sameAsTriples.add(t); > } > } > > @Override > public void flush() { } > > @Override > public void close() { } > }; > > // To enable RDFS inferencing uncomment the following two lines. > // You need to have your T-Box (ontology) loaded into some model > //Model ontologyModel = ... > //sink = InfFactory.infTriples(sink, ontologyModel); > > String filename = ... > RiotReader.parseTriples(new FileInputStream(filename), > Lang.guess(filename), null, sink); > > // Now do something with sameAsTriples > >