On 06/04/13 21:47, Tilo Fischer wrote:
On Sat, 06 Apr 2013 14:02:19 +0100
Andy Seaborne <[email protected]> wrote:

On 06/04/13 10:37, Tilo Fischer wrote:
Hi,

I would like to reduce the memory that is used by my application for
the GenericRuleReasoner.
Are there already measures like PATRICIA tries or dictionaries in
Jena that aim to store the nodes' content (Node.label) in one place
instead of in every node? Or are there reasons why this cannot be
accomplished?

Thanks, Tilo Fischer

Jena stores the string as given for the literals/URI.  Whether there
is shared string reuse is dependent on how the nodes are made. E.g.
RIOT uses an LRU cache on IRIs to both increase sharing and reduce
the re-resolving of URIs; there isn't such a space-saving cache for
literals (won't be a bad idea to have one but the savings may not be
that great unless there is lots of repeats.  Numbers often repeat,
xsd strings don't).

TDB and SDB do use dictionaries.

I don't believe the rules engine actually makes nodes, just triples,
so the size of the nodes (terms) is fixed by the base data.

There can be a lot of space going into maintaining the state of the
rules engine. I think this can end up being comparable to the size of
the base data.

What are you running up against?
Which route is the data taking to get into the rules engine?

        Andy




I have tried MemoryModel and GraphTDB as baseModel of my reasoner. Both
ways result in "java.lang.OutOfMemoryError: GC overhead limit exceeded"
when my raw data exeeds a certain limit and indeed use a lot of memory.
In this case the database is 37M and the corresponding ttl-File is 11M
big in file size.


All inference overhead is in-memory. TDB also takes a slice of heap for caches as well (a big slice if on a 32 bit JVM).

What heap size are you using?

If you can read the data into memory, and then, when you apply inference you run out of heap, it suggests its the inference daat structures - no new RDF terms are created unless your rules explicitly do so.

Inside the RETE network the data is represented as BindingVector which
is basically an array of Nodes. BindingVector has a clone-constructor
which uses System.arraycopy on that array but I cannot tell if that is
part of the problem.
I am using a lot of IRIs that consist of a prefix and an index. I will
try replace them with blank nodes with a reference and the index
seperately, which should be a good idea anyway.

That may make the in-memory footprint a bit smaller but


Thanks, Tilo


Reply via email to