On Sat, 06 Apr 2013 14:02:19 +0100 Andy Seaborne <[email protected]> wrote:
> On 06/04/13 10:37, Tilo Fischer wrote: > > Hi, > > > > I would like to reduce the memory that is used by my application for > > the GenericRuleReasoner. > > Are there already measures like PATRICIA tries or dictionaries in > > Jena that aim to store the nodes' content (Node.label) in one place > > instead of in every node? Or are there reasons why this cannot be > > accomplished? > > > > Thanks, Tilo Fischer > > Jena stores the string as given for the literals/URI. Whether there > is shared string reuse is dependent on how the nodes are made. E.g. > RIOT uses an LRU cache on IRIs to both increase sharing and reduce > the re-resolving of URIs; there isn't such a space-saving cache for > literals (won't be a bad idea to have one but the savings may not be > that great unless there is lots of repeats. Numbers often repeat, > xsd strings don't). > > TDB and SDB do use dictionaries. > > I don't believe the rules engine actually makes nodes, just triples, > so the size of the nodes (terms) is fixed by the base data. > > There can be a lot of space going into maintaining the state of the > rules engine. I think this can end up being comparable to the size of > the base data. > > What are you running up against? > Which route is the data taking to get into the rules engine? > > Andy > > > I have tried MemoryModel and GraphTDB as baseModel of my reasoner. Both ways result in "java.lang.OutOfMemoryError: GC overhead limit exceeded" when my raw data exeeds a certain limit and indeed use a lot of memory. In this case the database is 37M and the corresponding ttl-File is 11M big in file size. Inside the RETE network the data is represented as BindingVector which is basically an array of Nodes. BindingVector has a clone-constructor which uses System.arraycopy on that array but I cannot tell if that is part of the problem. I am using a lot of IRIs that consist of a prefix and an index. I will try replace them with blank nodes with a reference and the index seperately, which should be a good idea anyway. Thanks, Tilo
