On 20/01/11 19:43, Reto Bachmann-Gmuer wrote:
HI Andy
I've committed an application that uses directly jena without clerezza stuff
in the middle that demonstarts the problem.
Starting it with
MAVEN_OPTS="-Xmx256m -Xms128m" mvn clean install exec:java -o -e
it will fail at one of the files, howver if I change the order in which the
files are to be parsed and put the file it was failing at at the begginning
it suceeds parsing this file and will fail at another one.
the app is here:
http://svn.apache.org/viewvc/incubator/clerezza/issues/CLEREZZA-384/turtlememory
Not entirely without clerezza stuff - the POM does not work standalone.
After some POM hacking, I got it working. I take it the test is
"TestWithFiles".
It's not using RIOT because that's not in the Jena download yet.
Add
<dependency>
<groupId>com.hp.hpl.jena</groupId>
<artifactId>arq</artifactId>
<version>2.8.7</version>
</dependency>
and either:
com.hp.hpl.jena.query.ARQ.init() ;
or
org.openjena.riot.SysRIOT.wireIntoJena() ;
With this the test passes (and much faster as well).
The test is not just parsing. It's storing the results in a model so
the space needed included complete storage of the model.
Only a small increase in -Xmx (e.g. 350m) and the test passes.
The test fails in the first pass over the files if it's going to fail. I
suspect that one or more internal systems have fixed size caches. Jena
does. JavaCC has expanding buffering (and you have some very large
literals).
Jena's caches are bounded by number of slots so churning based on large
literals will need to settle down before any conclusions of a memory
leak can be made. Hence failing on the first pass is not suggestive of
a memory leak. This is backed up by the fact file order matters.
JavaCC used by the old parser uses expanding buffers and your long
literals will force those larger and hence the runtime working space is
higher on a single file parse. RIOT uses a fixed size buffer and builds
the large literals directly into the string to be used as the RDF node.
As increasing the heap means that the test runs and the test fails in
the first pass over the files if it is going to fail, I conclude it's
various caches filling up and just not fitting. I guess it passes at
256m with RIOT by chance. Slightly less overhead meaning that caches
just happen to fit.
There is a streaming interface to RIOT in org.openjena.riot.RiotReader.
Andy