On Fri, 2010-12-31 at 11:38 -0500, Benson Margulies wrote:
> On Fri, Dec 31, 2010 at 11:27 AM, Dave Reynolds
> <[email protected]> wrote:
> > On Fri, 2010-12-31 at 08:57 -0500, Benson Margulies wrote:
> >> Step 1:
> >>
> >> Model schema = ModelFactory.createDefaultModel();
> >> schema.read(RdfUtils.getJugOntology(),
> >> RdfUtils.getJugOntologyUri(), "RDF/XML");
> >> return ModelFactory.createRDFSModel(schema, data);
> >
> > What's in the data?
>
> typical item:
>
> <uri:jug:0618936a7a03bf236a291bcddbfde63b#e11>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> rex:Person ;
> rex:hasEntityDetectionSource
> "statistical" ;
> rex:hasNormalizedText
> "Obama" ;
> rex:hasOriginalText
> "Obama" ;
> rex:root "true" ;
> owl:sameAs <uri:jug:0618936a7a03bf236a291bcddbfde63b#e43> ;
> owl:sameAs <uri:jug:0618936a7a03bf236a291bcddbfde63b#e25> ;
> owl:sameAs <uri:jug:0618936a7a03bf236a291bcddbfde63b#e8> ;
> owl:sameAs <uri:jug:0618936a7a03bf236a291bcddbfde63b#e104> ;
> owl:sameAs <uri:jug:0618936a7a03bf236a291bcddbfde63b#e4> ;
> owl:sameAs <uri:jug:0618936a7a03bf236a291bcddbfde63b#e18> ;
> owl:sameAs <uri:jug:0618936a7a03bf236a291bcddbfde63b#e56> ;
> owl:sameAs <uri:jug:0618936a7a03bf236a291bcddbfde63b#e115> ;
> owl:sameAs <uri:jug:0618936a7a03bf236a291bcddbfde63b#e100> .
>
>
> >
> >> Step 2: about 50k tuples, many of them owl:sameAs
> >
> > What is 50k tuples, the data, the schema, both, something else?
>
> data. Schema is tiny.
Can you show us the schema?
> >
> >> Step 3:
> >>
> >> NodeIterator sameAsItems = model.listObjectsOfProperty(root,
> >> relatingProp); // prop is in fact owl:sameAs
> >> while (sameAsItems.hasNext()) {
> >> ...
> >> }
> >>
> >> Runs for a very long time, using a very large amount of memory.
> >> Eventually runs out of memory.
> >
> > Strange. The owl:sameAs reasoning can be hugely expensive (it is
> > fundamentally exponential) but the RDFS reasoner knows nothing about
> > owl:sameAs so isn't doing any of that reasoning.
>
> Interrupting it in Eclipse, it is definitely deep in the reasoner all
> the time until it runs out of memory and dies.
Is there definitely not an outer loop running?
I can imagine a space leak so repeated calls to the reasoner will use up
memory but find it hard to see how RDFS reasoning with a tiny schema
could blow up so badly.
Do you have a complete minimal example we could take a look at?
[I realize you've switched approach but I'd like to understand why RDFS
reasoning might blow up in this case.]
Dave