Re: leak but where after parsing rdf files?

Hasan Hasan Tue, 25 Jan 2011 02:14:08 -0800

On Tue, Jan 25, 2011 at 10:33 AM, Hasan Hasan <[email protected]> wrote:


> Hi Andy,
>
> thanks for taking a look at the code.
> This means that there is a limit to the number of triples with large
> literals that can be returned by jenaGraph.find(). Right? If this limit is
> exceeded, then it can lead to outofmemoryerror exception. And this limit
> depends on max memory allocated for heap, the size of literals ?
> So to see whether there is a memory leak, I could try to loop over
> jenaGraph.find() where in each iteration there shouldn't be a heap memory
> exception.
> I'll test it now and let you know.
>

I iterate 100 000 times graph.find() method and the freeMemory seems to
fluctuate within a certain range and never goes below a certain value which
indicates no memory leak. I hope this does not depend on the fact that I
have the same triples in each result set of graph.find()

hasan



> But we'll consider your suggestion to not have large literals in the
> triples, but their references.
>
> Cheers
> Hasan
>
> On Mon, Jan 24, 2011 at 10:11 PM, Andy Seaborne <
> [email protected]> wrote:
>
>>
>>
>> On 24/01/11 18:03, Hasan Hasan wrote:
>>
>>> Hi Andy
>>>
>>> attached I provide a bundle that when run can
>>> throw java.lang.OutOfMemoryError exception.
>>> I don't do any parsing in the code. I merely read triples from the graph
>>> generated in the previous or current execution.
>>>
>>> Invoked with:
>>> MAVEN_OPTS="-Xmx512m -Xms128m"  mvn clean install exec:java -o -e
>>> -Dexec.args="300 2"
>>>
>>> You can play with the arguments. You can generate some triples in
>>> current execution and retrieve them
>>> You can also only retrieve triples, in which case you need not
>>> specify -Dexec.args
>>> In the above example, 300 is the number of triples to be generated and
>>> added to the graph
>>> 2 is the type of literal used: xsd:base64Binary, if you specify 1, the
>>> type used is rdf:XMLLiteral
>>> Not all objects in the graph are of typed literals.
>>>
>>> Could you please check? Thanks.
>>>
>>
>> Yes, it does.  But I can tell that from the pom alone and reading between
>> the lines (and test cases) that it's about large literals.
>>
>> This uses TDB - TDB has various caches in JVM.
>>
>> Note that on 64 bit hardware, TDB will also use memory mapped I/O, which
>> counts towards the process size but not the heap.
>>
>> There are 2 100K slot caches in front of the node table in the heap, one
>> for node->NodeId and one for NodeId->Node (the latter is more important at
>> query time, the former at update time).  The policy is LRU.
>>
>> This has an implicit assumption that nodes are not comparable size - you
>> have 1.5Mbyte (3MBytes in Java!).
>>
>> If you want to store multimegabyte base64-encoded literals, you might wish
>> to consider using a blob store and storing the reference in teh RDF
>> database.  Even if this all worked naturally, you might want to do this
>> because it's an inefficient use of a valuable system resource (memory
>> space).
>>
>> ((Or submit a patch to Jena JIRA for a separate storage area and policy
>> for large literals. Or size sensitive cache implementation :-)))
>>
>> In theory, the caches are tunable because it's all constants in SystemTDB,
>> but it's untested as to the performance impact. It should in the appropriate
>> .info file as well, but it's not.
>>
>>        Andy
>>
>>
>>> Hasan
>>>
>>
>

Re: leak but where after parsing rdf files?

Reply via email to