On Tue, Jan 25, 2011 at 10:33 AM, Hasan Hasan <[email protected]> wrote:
> Hi Andy, > > thanks for taking a look at the code. > This means that there is a limit to the number of triples with large > literals that can be returned by jenaGraph.find(). Right? If this limit is > exceeded, then it can lead to outofmemoryerror exception. And this limit > depends on max memory allocated for heap, the size of literals ? > So to see whether there is a memory leak, I could try to loop over > jenaGraph.find() where in each iteration there shouldn't be a heap memory > exception. > I'll test it now and let you know. > I iterate 100 000 times graph.find() method and the freeMemory seems to fluctuate within a certain range and never goes below a certain value which indicates no memory leak. I hope this does not depend on the fact that I have the same triples in each result set of graph.find() hasan > But we'll consider your suggestion to not have large literals in the > triples, but their references. > > Cheers > Hasan > > On Mon, Jan 24, 2011 at 10:11 PM, Andy Seaborne < > [email protected]> wrote: > >> >> >> On 24/01/11 18:03, Hasan Hasan wrote: >> >>> Hi Andy >>> >>> attached I provide a bundle that when run can >>> throw java.lang.OutOfMemoryError exception. >>> I don't do any parsing in the code. I merely read triples from the graph >>> generated in the previous or current execution. >>> >>> Invoked with: >>> MAVEN_OPTS="-Xmx512m -Xms128m" mvn clean install exec:java -o -e >>> -Dexec.args="300 2" >>> >>> You can play with the arguments. You can generate some triples in >>> current execution and retrieve them >>> You can also only retrieve triples, in which case you need not >>> specify -Dexec.args >>> In the above example, 300 is the number of triples to be generated and >>> added to the graph >>> 2 is the type of literal used: xsd:base64Binary, if you specify 1, the >>> type used is rdf:XMLLiteral >>> Not all objects in the graph are of typed literals. >>> >>> Could you please check? Thanks. >>> >> >> Yes, it does. But I can tell that from the pom alone and reading between >> the lines (and test cases) that it's about large literals. >> >> This uses TDB - TDB has various caches in JVM. >> >> Note that on 64 bit hardware, TDB will also use memory mapped I/O, which >> counts towards the process size but not the heap. >> >> There are 2 100K slot caches in front of the node table in the heap, one >> for node->NodeId and one for NodeId->Node (the latter is more important at >> query time, the former at update time). The policy is LRU. >> >> This has an implicit assumption that nodes are not comparable size - you >> have 1.5Mbyte (3MBytes in Java!). >> >> If you want to store multimegabyte base64-encoded literals, you might wish >> to consider using a blob store and storing the reference in teh RDF >> database. Even if this all worked naturally, you might want to do this >> because it's an inefficient use of a valuable system resource (memory >> space). >> >> ((Or submit a patch to Jena JIRA for a separate storage area and policy >> for large literals. Or size sensitive cache implementation :-))) >> >> In theory, the caches are tunable because it's all constants in SystemTDB, >> but it's untested as to the performance impact. It should in the appropriate >> .info file as well, but it's not. >> >> Andy >> >> >>> Hasan >>> >> >
