Re: leak but where after parsing rdf files?

Andy Seaborne Fri, 21 Jan 2011 03:38:13 -0800


On 20/01/11 19:43, Reto Bachmann-Gmuer wrote:

HI Andy

I've committed an application that uses directly jena without clerezza stuff

in the middle that demonstarts the problem.

Starting it with

MAVEN_OPTS="-Xmx256m -Xms128m"  mvn clean install exec:java -o -e

it will fail at one of the files, howver if I change the order in which the
files are to be parsed and put the file it was failing at at the begginning
it suceeds parsing this file and will fail at another one.

the app is here:
http://svn.apache.org/viewvc/incubator/clerezza/issues/CLEREZZA-384/turtlememory


Not entirely without clerezza stuff - the POM does not work standalone.

After some POM hacking, I got it working. I take it the test is"TestWithFiles".


It's not using RIOT because that's not in the Jena download yet.

Add

    <dependency>
      <groupId>com.hp.hpl.jena</groupId>
      <artifactId>arq</artifactId>
      <version>2.8.7</version>
    </dependency>

and either:

        com.hp.hpl.jena.query.ARQ.init() ;

or

        org.openjena.riot.SysRIOT.wireIntoJena() ;

With this the test passes (and much faster as well).

The test is not just parsing. It's storing the results in a model sothe space needed included complete storage of the model.


Only a small increase in -Xmx (e.g. 350m) and the test passes.

The test fails in the first pass over the files if it's going to fail. Isuspect that one or more internal systems have fixed size caches. Jenadoes. JavaCC has expanding buffering (and you have some very largeliterals).

Jena's caches are bounded by number of slots so churning based on largeliterals will need to settle down before any conclusions of a memoryleak can be made. Hence failing on the first pass is not suggestive ofa memory leak. This is backed up by the fact file order matters.

JavaCC used by the old parser uses expanding buffers and your longliterals will force those larger and hence the runtime working space ishigher on a single file parse. RIOT uses a fixed size buffer and buildsthe large literals directly into the string to be used as the RDF node.

As increasing the heap means that the test runs and the test fails inthe first pass over the files if it is going to fail, I conclude it'svarious caches filling up and just not fitting. I guess it passes at256m with RIOT by chance. Slightly less overhead meaning that cachesjust happen to fit.


There is a streaming interface to RIOT in org.openjena.riot.RiotReader.

        Andy

Re: leak but where after parsing rdf files?

Reply via email to