RE: processing .rdf files for specific property types only

Gunaratna, Dalkandura Arachchige Kalpa Shashika Silva Sun, 22 Apr 2012 21:55:40 -0700

Thnx Stephen. I will try this code segment and see the difference. 
________________________________________
From: Stephen Allen [sal...@apache.org]
Sent: Sunday, April 22, 2012 10:38 PM
To: jena-users@incubator.apache.org
Subject: Re: processing .rdf files for specific property types only


I'm not sure I understand your question.  The code I posted will read
the file in a single pass, and filter it down to only statements that
contain the owl:sameAs resource in the predicate position.  This is
about the fastest way you can parse your RDF.  It will also use a lot
less memory than storing it in an in-memory model, as it works in a
streaming fashion.  Also, if you don't need RDFS inferencing don't
include it as it adds overhead.

Try it out with your code, and see what the performance difference is.

As a side note, the comparison in your if statement will be a little
slower than mine since you are using String.contains(), and
potentially incorrect if some other predicate had the string
"owl#sameAs" in it, but wasn't the full
"http://www.w3.org/2002/07/owl#sameAs";.

-Stephen

On Sun, Apr 22, 2012 at 7:19 PM, Gunaratna, Dalkandura Arachchige
Kalpa Shashika Silva <gunaratn...@wright.edu> wrote:
> Hi Stephen,
>   Will it increase the efficiency (speed) in processing? In you code,
>
>            if (OWL.sameAs.asNode().equals(t.getPredicate()))
>            {
>                // You can either do something immediately with this
> triple, or stick it a HashSet to enforce uniqueness
>                sameAsTriples.add(t);
>            }
>
> you compare every statement in the model by reading each line in the file as 
> I tried to do earlier like follows,
>
> String predicate = st.getPredicate().getURI().toLowerCase();
> if(predicate.contains("owl#sameas"))
> {
> do something to get the list of sameAs links
> }
>
> Thank you.
>
> ________________________________________
> From: Stephen Allen [sal...@apache.org]
> Sent: Sunday, April 22, 2012 10:05 PM
> To: jena-users@incubator.apache.org
> Subject: Re: processing .rdf files for specific property types only
>
> On Sun, Apr 22, 2012 at 6:17 PM, Gunaratna, Dalkandura Arachchige
> Kalpa Shashika Silva <gunaratn...@wright.edu> wrote:
>> Hi,
>>   I have a simple requirement and that is to read 
>> <http://www.w3.org/2002/07/owl#sameAs> object values (sameAs link value) in 
>> a rdf file. For that I create an ontology model and read the whole file. 
>> Following is a code sample I sue for that.
>>
>> model=ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM);
>>                SysRIOT.wireIntoJena() ;
>>                model.read(url);
>>                StmtIterator stmtItr=model.listStatements();
>>
>> This way of processing has a huge processing overhead for my program since 
>> for every rdf file I just need to read the whole file to get sameAs links. 
>> Is there any other way of doing this kind of work or only possible way is to 
>> read the whole file to get the specific property type we want?
>>
>> And also, what happens if we do not call model.close() at the end? Will it 
>> be a problem which will cause heap out of space problem?
>>
>> Thank you,
>> Kalpa
>
> Hi Kalpa,
>
> If you are simply interested in parsing an RDF file in a streaming
> fashion, you can do something like below.  If you know that you don't
> have any duplicate triples, then you can eliminate the HashSet.
>
>    final Set<Triple> sameAsTriples = new HashSet<Triple>();
>    Sink<Triple> sink = new Sink<Triple>()
>    {
>        @Override
>        public void send(Triple t)
>        {
>            if (OWL.sameAs.asNode().equals(t.getPredicate()))
>            {
>                // You can either do something immediately with this
> triple, or stick it a HashSet to enforce uniqueness
>                sameAsTriples.add(t);
>            }
>        }
>
>        @Override
>        public void flush() { }
>
>        @Override
>        public void close() { }
>    };
>
>    // To enable RDFS inferencing uncomment the following two lines.
>    // You need to have your T-Box (ontology) loaded into some model
>    //Model ontologyModel = ...
>    //sink = InfFactory.infTriples(sink, ontologyModel);
>
>    String filename = ...
>    RiotReader.parseTriples(new FileInputStream(filename),
> Lang.guess(filename), null, sink);
>
>    // Now do something with sameAsTriples
>
>

RE: processing .rdf files for specific property types only

Reply via email to