RE: processing .rdf files for specific property types only

Gunaratna, Dalkandura Arachchige Kalpa Shashika Silva Mon, 23 Apr 2012 09:34:44 -0700

Stephen, I have a question here on how to run the code. Here is the question. 
When I want to read a rdf file, I just get the url of the rdf file and create a 
OntModel and read it. For the model, I just need to give the url only. But for 
the code you have suggested needs a filename (locally). In this case, can we do 
the same I did previously? For example, I just use url strings as follows in my 
code.


http://rdf.freebase.com/ns/m/067n4r
http://rdf.freebase.com/ns/en.mountain_view
http://dbpedia.org/resource/Mountain_View,_California

I do not down download the rdf files in my code as of now but I do not know 
whether Jena downloads files when giving a url string to the model to read. Any 
help will be greatly appreciated. Thank you. 


________________________________________
From: Stephen Allen [sal...@apache.org]
Sent: Monday, April 23, 2012 2:43 AM
To: jena-users@incubator.apache.org
Subject: Re: processing .rdf files for specific property types only

Yes, we moved to the Apache community about a year ago.  The latest
release version of ARQ is 2.9.0, and the latest of Jena Core is 2.7.0.
 You can download them from the Apache distribution site [1], which is
linked to by [2].

-Stephen

[1] http://www.apache.org/dist/incubator/jena/
[2] http://incubator.apache.org/jena/download/index.html


On Sun, Apr 22, 2012 at 10:38 PM, Gunaratna, Dalkandura Arachchige
Kalpa Shashika Silva <gunaratn...@wright.edu> wrote:
> One question to follow up. I am using ARQ 2.8.5 distribution and its content 
> (jena packages). The class Triple does not seem to work with that 
> distribution and I just downloaded 2.8.6 from source-forge release dated on 
> 2011-04-21. Is there any other new package available for this cause or any 
> newer distribution available other than in source-forge cite? Thank you.
> ________________________________________
> From: Stephen Allen [sal...@apache.org]
> Sent: Sunday, April 22, 2012 10:38 PM
> To: jena-users@incubator.apache.org
> Subject: Re: processing .rdf files for specific property types only
>
> I'm not sure I understand your question.  The code I posted will read
> the file in a single pass, and filter it down to only statements that
> contain the owl:sameAs resource in the predicate position.  This is
> about the fastest way you can parse your RDF.  It will also use a lot
> less memory than storing it in an in-memory model, as it works in a
> streaming fashion.  Also, if you don't need RDFS inferencing don't
> include it as it adds overhead.
>
> Try it out with your code, and see what the performance difference is.
>
> As a side note, the comparison in your if statement will be a little
> slower than mine since you are using String.contains(), and
> potentially incorrect if some other predicate had the string
> "owl#sameAs" in it, but wasn't the full
> "http://www.w3.org/2002/07/owl#sameAs";.
>
> -Stephen
>
> On Sun, Apr 22, 2012 at 7:19 PM, Gunaratna, Dalkandura Arachchige
> Kalpa Shashika Silva <gunaratn...@wright.edu> wrote:
>> Hi Stephen,
>>   Will it increase the efficiency (speed) in processing? In you code,
>>
>>            if (OWL.sameAs.asNode().equals(t.getPredicate()))
>>            {
>>                // You can either do something immediately with this
>> triple, or stick it a HashSet to enforce uniqueness
>>                sameAsTriples.add(t);
>>            }
>>
>> you compare every statement in the model by reading each line in the file as 
>> I tried to do earlier like follows,
>>
>> String predicate = st.getPredicate().getURI().toLowerCase();
>> if(predicate.contains("owl#sameas"))
>> {
>> do something to get the list of sameAs links
>> }
>>
>> Thank you.
>>
>> ________________________________________
>> From: Stephen Allen [sal...@apache.org]
>> Sent: Sunday, April 22, 2012 10:05 PM
>> To: jena-users@incubator.apache.org
>> Subject: Re: processing .rdf files for specific property types only
>>
>> On Sun, Apr 22, 2012 at 6:17 PM, Gunaratna, Dalkandura Arachchige
>> Kalpa Shashika Silva <gunaratn...@wright.edu> wrote:
>>> Hi,
>>>   I have a simple requirement and that is to read 
>>> <http://www.w3.org/2002/07/owl#sameAs> object values (sameAs link value) in 
>>> a rdf file. For that I create an ontology model and read the whole file. 
>>> Following is a code sample I sue for that.
>>>
>>> model=ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM);
>>>                SysRIOT.wireIntoJena() ;
>>>                model.read(url);
>>>                StmtIterator stmtItr=model.listStatements();
>>>
>>> This way of processing has a huge processing overhead for my program since 
>>> for every rdf file I just need to read the whole file to get sameAs links. 
>>> Is there any other way of doing this kind of work or only possible way is 
>>> to read the whole file to get the specific property type we want?
>>>
>>> And also, what happens if we do not call model.close() at the end? Will it 
>>> be a problem which will cause heap out of space problem?
>>>
>>> Thank you,
>>> Kalpa
>>
>> Hi Kalpa,
>>
>> If you are simply interested in parsing an RDF file in a streaming
>> fashion, you can do something like below.  If you know that you don't
>> have any duplicate triples, then you can eliminate the HashSet.
>>
>>    final Set<Triple> sameAsTriples = new HashSet<Triple>();
>>    Sink<Triple> sink = new Sink<Triple>()
>>    {
>>        @Override
>>        public void send(Triple t)
>>        {
>>            if (OWL.sameAs.asNode().equals(t.getPredicate()))
>>            {
>>                // You can either do something immediately with this
>> triple, or stick it a HashSet to enforce uniqueness
>>                sameAsTriples.add(t);
>>            }
>>        }
>>
>>        @Override
>>        public void flush() { }
>>
>>        @Override
>>        public void close() { }
>>    };
>>
>>    // To enable RDFS inferencing uncomment the following two lines.
>>    // You need to have your T-Box (ontology) loaded into some model
>>    //Model ontologyModel = ...
>>    //sink = InfFactory.infTriples(sink, ontologyModel);
>>
>>    String filename = ...
>>    RiotReader.parseTriples(new FileInputStream(filename),
>> Lang.guess(filename), null, sink);
>>
>>    // Now do something with sameAsTriples
>>
>>
>
>

RE: processing .rdf files for specific property types only

Reply via email to