Re: leak but where after parsing rdf files?

Reto Bachmann-Gmuer Sat, 22 Jan 2011 03:30:22 -0800

In did its much faster!

Still there is the issue that might be caused by this fixed-size
hard-referenced caches. If I start with -Xmx190 it parser test6.ttl
successfull the first time and gets a memory exception the second time.


Reto

On Fri, Jan 21, 2011 at 12:37 PM, Andy Seaborne <
[email protected]> wrote:

>
>
> On 20/01/11 19:43, Reto Bachmann-Gmuer wrote:
>
>> HI Andy
>>
>> I've committed an application that uses directly jena without clerezza
>> stuff
>>
>
>
>  in the middle that demonstarts the problem.
>>
>> Starting it with
>>
>> MAVEN_OPTS="-Xmx256m -Xms128m"  mvn clean install exec:java -o -e
>>
>> it will fail at one of the files, howver if I change the order in which
>> the
>> files are to be parsed and put the file it was failing at at the
>> begginning
>> it suceeds parsing this file and will fail at another one.
>>
>> the app is here:
>>
>> http://svn.apache.org/viewvc/incubator/clerezza/issues/CLEREZZA-384/turtlememory
>>
>
> Not entirely without clerezza stuff - the POM does not work standalone.
>
> After some POM hacking, I got it working.  I take it the test is
> "TestWithFiles".
>
> It's not using RIOT because that's not in the Jena download yet.
>
> Add
>
>    <dependency>
>      <groupId>com.hp.hpl.jena</groupId>
>      <artifactId>arq</artifactId>
>      <version>2.8.7</version>
>    </dependency>
>
> and either:
>
>        com.hp.hpl.jena.query.ARQ.init() ;
>
> or
>
>        org.openjena.riot.SysRIOT.wireIntoJena() ;
>
> With this the test passes (and much faster as well).
>
>
> The test is not just parsing.  It's storing the results in a model so the
> space needed included complete storage of the model.
>
> Only a small increase in -Xmx (e.g. 350m) and the test passes.
>
> The test fails in the first pass over the files if it's going to fail. I
> suspect that one or more internal systems have fixed size caches.  Jena
> does.  JavaCC has expanding buffering (and you have some very large
> literals).
>
> Jena's caches are bounded by number of slots so churning based on large
> literals will need to settle down before any conclusions of a memory leak
> can be made.  Hence failing on the first pass is not suggestive of a memory
> leak.  This is backed up by the fact file order matters.
>
> JavaCC used by the old parser uses expanding buffers and your long literals
> will force those larger and hence the runtime working space is higher on a
> single file parse.  RIOT uses a fixed size buffer and builds the large
> literals directly into the string to be used as the RDF node.
>
> As increasing the heap means that the test runs and the test fails in the
> first pass over the files if it is going to fail, I conclude it's various
> caches filling up and just not fitting. I guess it passes at 256m with RIOT
> by chance.  Slightly less overhead meaning that caches just happen to fit.
>
> There is a streaming interface to RIOT in org.openjena.riot.RiotReader.
>
>        Andy
>

Re: leak but where after parsing rdf files?

Reply via email to