Stephen, I have a question here on how to run the code. Here is the question. When I want to read a rdf file, I just get the url of the rdf file and create a OntModel and read it. For the model, I just need to give the url only. But for the code you have suggested needs a filename (locally). In this case, can we do the same I did previously? For example, I just use url strings as follows in my code.
http://rdf.freebase.com/ns/m/067n4r http://rdf.freebase.com/ns/en.mountain_view http://dbpedia.org/resource/Mountain_View,_California I do not down download the rdf files in my code as of now but I do not know whether Jena downloads files when giving a url string to the model to read. Any help will be greatly appreciated. Thank you. ________________________________________ From: Stephen Allen [sal...@apache.org] Sent: Monday, April 23, 2012 2:43 AM To: jena-users@incubator.apache.org Subject: Re: processing .rdf files for specific property types only Yes, we moved to the Apache community about a year ago. The latest release version of ARQ is 2.9.0, and the latest of Jena Core is 2.7.0. You can download them from the Apache distribution site [1], which is linked to by [2]. -Stephen [1] http://www.apache.org/dist/incubator/jena/ [2] http://incubator.apache.org/jena/download/index.html On Sun, Apr 22, 2012 at 10:38 PM, Gunaratna, Dalkandura Arachchige Kalpa Shashika Silva <gunaratn...@wright.edu> wrote: > One question to follow up. I am using ARQ 2.8.5 distribution and its content > (jena packages). The class Triple does not seem to work with that > distribution and I just downloaded 2.8.6 from source-forge release dated on > 2011-04-21. Is there any other new package available for this cause or any > newer distribution available other than in source-forge cite? Thank you. > ________________________________________ > From: Stephen Allen [sal...@apache.org] > Sent: Sunday, April 22, 2012 10:38 PM > To: jena-users@incubator.apache.org > Subject: Re: processing .rdf files for specific property types only > > I'm not sure I understand your question. The code I posted will read > the file in a single pass, and filter it down to only statements that > contain the owl:sameAs resource in the predicate position. This is > about the fastest way you can parse your RDF. It will also use a lot > less memory than storing it in an in-memory model, as it works in a > streaming fashion. Also, if you don't need RDFS inferencing don't > include it as it adds overhead. > > Try it out with your code, and see what the performance difference is. > > As a side note, the comparison in your if statement will be a little > slower than mine since you are using String.contains(), and > potentially incorrect if some other predicate had the string > "owl#sameAs" in it, but wasn't the full > "http://www.w3.org/2002/07/owl#sameAs". > > -Stephen > > On Sun, Apr 22, 2012 at 7:19 PM, Gunaratna, Dalkandura Arachchige > Kalpa Shashika Silva <gunaratn...@wright.edu> wrote: >> Hi Stephen, >> Will it increase the efficiency (speed) in processing? In you code, >> >> if (OWL.sameAs.asNode().equals(t.getPredicate())) >> { >> // You can either do something immediately with this >> triple, or stick it a HashSet to enforce uniqueness >> sameAsTriples.add(t); >> } >> >> you compare every statement in the model by reading each line in the file as >> I tried to do earlier like follows, >> >> String predicate = st.getPredicate().getURI().toLowerCase(); >> if(predicate.contains("owl#sameas")) >> { >> do something to get the list of sameAs links >> } >> >> Thank you. >> >> ________________________________________ >> From: Stephen Allen [sal...@apache.org] >> Sent: Sunday, April 22, 2012 10:05 PM >> To: jena-users@incubator.apache.org >> Subject: Re: processing .rdf files for specific property types only >> >> On Sun, Apr 22, 2012 at 6:17 PM, Gunaratna, Dalkandura Arachchige >> Kalpa Shashika Silva <gunaratn...@wright.edu> wrote: >>> Hi, >>> I have a simple requirement and that is to read >>> <http://www.w3.org/2002/07/owl#sameAs> object values (sameAs link value) in >>> a rdf file. For that I create an ontology model and read the whole file. >>> Following is a code sample I sue for that. >>> >>> model=ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM); >>> SysRIOT.wireIntoJena() ; >>> model.read(url); >>> StmtIterator stmtItr=model.listStatements(); >>> >>> This way of processing has a huge processing overhead for my program since >>> for every rdf file I just need to read the whole file to get sameAs links. >>> Is there any other way of doing this kind of work or only possible way is >>> to read the whole file to get the specific property type we want? >>> >>> And also, what happens if we do not call model.close() at the end? Will it >>> be a problem which will cause heap out of space problem? >>> >>> Thank you, >>> Kalpa >> >> Hi Kalpa, >> >> If you are simply interested in parsing an RDF file in a streaming >> fashion, you can do something like below. If you know that you don't >> have any duplicate triples, then you can eliminate the HashSet. >> >> final Set<Triple> sameAsTriples = new HashSet<Triple>(); >> Sink<Triple> sink = new Sink<Triple>() >> { >> @Override >> public void send(Triple t) >> { >> if (OWL.sameAs.asNode().equals(t.getPredicate())) >> { >> // You can either do something immediately with this >> triple, or stick it a HashSet to enforce uniqueness >> sameAsTriples.add(t); >> } >> } >> >> @Override >> public void flush() { } >> >> @Override >> public void close() { } >> }; >> >> // To enable RDFS inferencing uncomment the following two lines. >> // You need to have your T-Box (ontology) loaded into some model >> //Model ontologyModel = ... >> //sink = InfFactory.infTriples(sink, ontologyModel); >> >> String filename = ... >> RiotReader.parseTriples(new FileInputStream(filename), >> Lang.guess(filename), null, sink); >> >> // Now do something with sameAsTriples >> >> > >