On 04/04/17 08:45, Dimov, Stefan wrote:
Thanks Dave,
Yes, I’m using TDB.
Using memory would be faster, I guess, but would the machine be able to handle
millions of triples? Is Jena optimized for that?
Fundamentally the Jena reasoner isn't that scalable so whether it can
handle "millions of triples" depends on the specific rules and whether
that's ~ 2 million or ~200 million.
As a rule of thumb for just storage and management I would allow 1k per
triple (depending on literal sizes) so a 2MT dataset would need ~2Gb.
For simple inference you might "only" need a few times that that but
sky's the limit. Thing to do is give it a try with say 10GB and see
where you get.
Your other option, depending again on the specifics of what you are
trying to do, is to not use the reasoner at all but perform equivalent
processing using SPARQL updates which can then run directly over TDB.
Dave
S.
On 4/4/17, 12:34 AM, "Dave Reynolds" <[email protected]> wrote:
The reasoners store all their information in memory.
Your mention of transactions suggests that you are storing into a TDB or
other database-backed store. That will not enable the reasoner to scale
and will just slow things down. You'll get better performance by loading
the data into memory and then applying the reasoner to that.
You will, of course, need to allocate enough memory to this.
What performance is like and how much memory is needed will depend on
the details of your rules. After all the core RDFS is less than ten rules!
Dave
On 04/04/17 04:07, Dimov, Stefan wrote:
> … and after an hour or so, eventually it failed with:
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at
org.apache.jena.reasoner.rulesys.impl.BindingVectorMultiSet.getPartialEnv(BindingVectorMultiSet.java:119)
> at
org.apache.jena.reasoner.rulesys.impl.BindingVectorMultiSet.put(BindingVectorMultiSet.java:159)
> at
org.apache.jena.reasoner.rulesys.impl.BindingVectorMultiSet.add(BindingVectorMultiSet.java:91)
> at
org.apache.jena.reasoner.rulesys.impl.RETEQueue.fire(RETEQueue.java:105)
> at
org.apache.jena.reasoner.rulesys.impl.RETEClauseFilter.fire(RETEClauseFilter.java:227)
> at
org.apache.jena.reasoner.rulesys.impl.RETEEngine.inject(RETEEngine.java:492)
> at
org.apache.jena.reasoner.rulesys.impl.RETEEngine.runAll(RETEEngine.java:474)
> at
org.apache.jena.reasoner.rulesys.impl.RETEEngine.fastInit(RETEEngine.java:163)
> at
org.apache.jena.reasoner.rulesys.FBRuleInfGraph.prepare(FBRuleInfGraph.java:471)
> at
org.apache.jena.reasoner.BaseInfGraph.requirePrepared(BaseInfGraph.java:530)
> at
org.apache.jena.reasoner.rulesys.FBRuleInfGraph.findWithContinuation(FBRuleInfGraph.java:557)
> at
org.apache.jena.reasoner.rulesys.FBRuleInfGraph.graphBaseFind(FBRuleInfGraph.java:587)
> at
org.apache.jena.reasoner.BaseInfGraph.graphBaseFind(BaseInfGraph.java:359)
> at
org.apache.jena.graph.impl.GraphBase.find(GraphBase.java:241)
> at
org.apache.jena.graph.GraphUtil.findAll(GraphUtil.java:99)
> at
org.apache.jena.graph.GraphUtil.addInto(GraphUtil.java:151)
> at
org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:225)
>
> S.
>
> From: Stefan Dimov <[email protected]>
> Date: Monday, April 3, 2017 at 4:37 PM
> To: "[email protected]" <[email protected]>
> Subject: Long time to load the reasoner ...
>
> Hi all,
>
> I’m loading my Jena with a few million triples in chunks (every chunk in
a separate transaction). It takes a few minutes.
>
> Then I’m loading the reasoner (in a separate transaction), which contains
less than ten rules and it takes a loooong time.
>
> Why is this? Am I doing something wrong or that’s to be expected? Should
I change some settings? Increase the memory?
>
> S.
>