yes, that is true, but I had a problem in calling this function, RiotReader.parseTriples(is, Lang.RDFXML, null, sink);
I put Lang.RDFXML instead of your suggestion Lang.guess(filename) and then the processing seem to take for ever. May be I have done something wrong here. Following is the full code I used for testing. URL dataURL = new URL(url); URLConnection conn = dataURL.openConnection(); InputStream is = conn.getInputStream(); BufferedReader in = new BufferedReader(new InputStreamReader(is)); final Set<Triple> sameAsTriples = new HashSet<Triple>(); Sink<Triple> sink = new Sink<Triple>() { @Override public void send(Triple t) { System.out.println("##"); if (OWL.sameAs.asNode().equals(t.getPredicate())) { // You can either do something immediately with this // triple, or stick it a HashSet to enforce uniqueness sameAsTriples.add(t); } } @Override public void flush() { } @Override public void close() { } }; // To enable RDFS inferencing uncomment the following two lines. // You need to have your T-Box (ontology) loaded into some model //Model ontologyModel = ... //sink = InfFactory.infTriples(sink, ontologyModel); // String filename = ... // RiotReader.parseTriples(new FileInputStream(filename), // Lang.guess(filename), null, sink); RiotReader.parseTriples(is, Lang.RDFXML, null, sink); in.close(); even I didn't see the "##" I put to see whether the program is parsing the rdf file. Is there anything I missed? The program seems not to work. ________________________________________ From: Stephen Allen [sal...@apache.org] Sent: Monday, April 23, 2012 5:38 PM To: jena-users@incubator.apache.org Subject: Re: processing .rdf files for specific property types only You can use any InputStream. To get one for a URL, try something like this: URL url = new URL(urlString); InputStream in = url.openConnection().getInputStream(); On Mon, Apr 23, 2012 at 9:34 AM, Gunaratna, Dalkandura Arachchige Kalpa Shashika Silva <gunaratn...@wright.edu> wrote: > Stephen, I have a question here on how to run the code. Here is the question. > When I want to read a rdf file, I just get the url of the rdf file and create > a OntModel and read it. For the model, I just need to give the url only. But > for the code you have suggested needs a filename (locally). In this case, can > we do the same I did previously? For example, I just use url strings as > follows in my code. > > http://rdf.freebase.com/ns/m/067n4r > http://rdf.freebase.com/ns/en.mountain_view > http://dbpedia.org/resource/Mountain_View,_California > > I do not down download the rdf files in my code as of now but I do not know > whether Jena downloads files when giving a url string to the model to read. > Any help will be greatly appreciated. Thank you. > > > ________________________________________ > From: Stephen Allen [sal...@apache.org] > Sent: Monday, April 23, 2012 2:43 AM > To: jena-users@incubator.apache.org > Subject: Re: processing .rdf files for specific property types only > > Yes, we moved to the Apache community about a year ago. The latest > release version of ARQ is 2.9.0, and the latest of Jena Core is 2.7.0. > You can download them from the Apache distribution site [1], which is > linked to by [2]. > > -Stephen > > [1] http://www.apache.org/dist/incubator/jena/ > [2] http://incubator.apache.org/jena/download/index.html > > > On Sun, Apr 22, 2012 at 10:38 PM, Gunaratna, Dalkandura Arachchige > Kalpa Shashika Silva <gunaratn...@wright.edu> wrote: >> One question to follow up. I am using ARQ 2.8.5 distribution and its content >> (jena packages). The class Triple does not seem to work with that >> distribution and I just downloaded 2.8.6 from source-forge release dated on >> 2011-04-21. Is there any other new package available for this cause or any >> newer distribution available other than in source-forge cite? Thank you. >> ________________________________________ >> From: Stephen Allen [sal...@apache.org] >> Sent: Sunday, April 22, 2012 10:38 PM >> To: jena-users@incubator.apache.org >> Subject: Re: processing .rdf files for specific property types only >> >> I'm not sure I understand your question. The code I posted will read >> the file in a single pass, and filter it down to only statements that >> contain the owl:sameAs resource in the predicate position. This is >> about the fastest way you can parse your RDF. It will also use a lot >> less memory than storing it in an in-memory model, as it works in a >> streaming fashion. Also, if you don't need RDFS inferencing don't >> include it as it adds overhead. >> >> Try it out with your code, and see what the performance difference is. >> >> As a side note, the comparison in your if statement will be a little >> slower than mine since you are using String.contains(), and >> potentially incorrect if some other predicate had the string >> "owl#sameAs" in it, but wasn't the full >> "http://www.w3.org/2002/07/owl#sameAs". >> >> -Stephen >> >> On Sun, Apr 22, 2012 at 7:19 PM, Gunaratna, Dalkandura Arachchige >> Kalpa Shashika Silva <gunaratn...@wright.edu> wrote: >>> Hi Stephen, >>> Will it increase the efficiency (speed) in processing? In you code, >>> >>> if (OWL.sameAs.asNode().equals(t.getPredicate())) >>> { >>> // You can either do something immediately with this >>> triple, or stick it a HashSet to enforce uniqueness >>> sameAsTriples.add(t); >>> } >>> >>> you compare every statement in the model by reading each line in the file >>> as I tried to do earlier like follows, >>> >>> String predicate = st.getPredicate().getURI().toLowerCase(); >>> if(predicate.contains("owl#sameas")) >>> { >>> do something to get the list of sameAs links >>> } >>> >>> Thank you. >>> >>> ________________________________________ >>> From: Stephen Allen [sal...@apache.org] >>> Sent: Sunday, April 22, 2012 10:05 PM >>> To: jena-users@incubator.apache.org >>> Subject: Re: processing .rdf files for specific property types only >>> >>> On Sun, Apr 22, 2012 at 6:17 PM, Gunaratna, Dalkandura Arachchige >>> Kalpa Shashika Silva <gunaratn...@wright.edu> wrote: >>>> Hi, >>>> I have a simple requirement and that is to read >>>> <http://www.w3.org/2002/07/owl#sameAs> object values (sameAs link value) >>>> in a rdf file. For that I create an ontology model and read the whole >>>> file. Following is a code sample I sue for that. >>>> >>>> model=ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM); >>>> SysRIOT.wireIntoJena() ; >>>> model.read(url); >>>> StmtIterator stmtItr=model.listStatements(); >>>> >>>> This way of processing has a huge processing overhead for my program since >>>> for every rdf file I just need to read the whole file to get sameAs links. >>>> Is there any other way of doing this kind of work or only possible way is >>>> to read the whole file to get the specific property type we want? >>>> >>>> And also, what happens if we do not call model.close() at the end? Will it >>>> be a problem which will cause heap out of space problem? >>>> >>>> Thank you, >>>> Kalpa >>> >>> Hi Kalpa, >>> >>> If you are simply interested in parsing an RDF file in a streaming >>> fashion, you can do something like below. If you know that you don't >>> have any duplicate triples, then you can eliminate the HashSet. >>> >>> final Set<Triple> sameAsTriples = new HashSet<Triple>(); >>> Sink<Triple> sink = new Sink<Triple>() >>> { >>> @Override >>> public void send(Triple t) >>> { >>> if (OWL.sameAs.asNode().equals(t.getPredicate())) >>> { >>> // You can either do something immediately with this >>> triple, or stick it a HashSet to enforce uniqueness >>> sameAsTriples.add(t); >>> } >>> } >>> >>> @Override >>> public void flush() { } >>> >>> @Override >>> public void close() { } >>> }; >>> >>> // To enable RDFS inferencing uncomment the following two lines. >>> // You need to have your T-Box (ontology) loaded into some model >>> //Model ontologyModel = ... >>> //sink = InfFactory.infTriples(sink, ontologyModel); >>> >>> String filename = ... >>> RiotReader.parseTriples(new FileInputStream(filename), >>> Lang.guess(filename), null, sink); >>> >>> // Now do something with sameAsTriples >>> >>> >> >> > >