On Sat, 06 Apr 2013 14:02:19 +0100
Andy Seaborne <[email protected]> wrote:

> On 06/04/13 10:37, Tilo Fischer wrote:
> > Hi,
> >
> > I would like to reduce the memory that is used by my application for
> > the GenericRuleReasoner.
> > Are there already measures like PATRICIA tries or dictionaries in
> > Jena that aim to store the nodes' content (Node.label) in one place
> > instead of in every node? Or are there reasons why this cannot be
> > accomplished?
> >
> > Thanks, Tilo Fischer
> 
> Jena stores the string as given for the literals/URI.  Whether there
> is shared string reuse is dependent on how the nodes are made. E.g.
> RIOT uses an LRU cache on IRIs to both increase sharing and reduce
> the re-resolving of URIs; there isn't such a space-saving cache for
> literals (won't be a bad idea to have one but the savings may not be
> that great unless there is lots of repeats.  Numbers often repeat,
> xsd strings don't).
> 
> TDB and SDB do use dictionaries.
> 
> I don't believe the rules engine actually makes nodes, just triples,
> so the size of the nodes (terms) is fixed by the base data.
> 
> There can be a lot of space going into maintaining the state of the 
> rules engine. I think this can end up being comparable to the size of 
> the base data.
> 
> What are you running up against?
> Which route is the data taking to get into the rules engine?
> 
>       Andy
> 
> 
> 

I have tried MemoryModel and GraphTDB as baseModel of my reasoner. Both
ways result in "java.lang.OutOfMemoryError: GC overhead limit exceeded"
when my raw data exeeds a certain limit and indeed use a lot of memory.
In this case the database is 37M and the corresponding ttl-File is 11M
big in file size.
Inside the RETE network the data is represented as BindingVector which
is basically an array of Nodes. BindingVector has a clone-constructor
which uses System.arraycopy on that array but I cannot tell if that is
part of the problem.
I am using a lot of IRIs that consist of a prefix and an index. I will
try replace them with blank nodes with a reference and the index
seperately, which should be a good idea anyway.

Thanks, Tilo

Reply via email to