In did its much faster! Still there is the issue that might be caused by this fixed-size hard-referenced caches. If I start with -Xmx190 it parser test6.ttl successfull the first time and gets a memory exception the second time.
Reto On Fri, Jan 21, 2011 at 12:37 PM, Andy Seaborne < [email protected]> wrote: > > > On 20/01/11 19:43, Reto Bachmann-Gmuer wrote: > >> HI Andy >> >> I've committed an application that uses directly jena without clerezza >> stuff >> > > > in the middle that demonstarts the problem. >> >> Starting it with >> >> MAVEN_OPTS="-Xmx256m -Xms128m" mvn clean install exec:java -o -e >> >> it will fail at one of the files, howver if I change the order in which >> the >> files are to be parsed and put the file it was failing at at the >> begginning >> it suceeds parsing this file and will fail at another one. >> >> the app is here: >> >> http://svn.apache.org/viewvc/incubator/clerezza/issues/CLEREZZA-384/turtlememory >> > > Not entirely without clerezza stuff - the POM does not work standalone. > > After some POM hacking, I got it working. I take it the test is > "TestWithFiles". > > It's not using RIOT because that's not in the Jena download yet. > > Add > > <dependency> > <groupId>com.hp.hpl.jena</groupId> > <artifactId>arq</artifactId> > <version>2.8.7</version> > </dependency> > > and either: > > com.hp.hpl.jena.query.ARQ.init() ; > > or > > org.openjena.riot.SysRIOT.wireIntoJena() ; > > With this the test passes (and much faster as well). > > > The test is not just parsing. It's storing the results in a model so the > space needed included complete storage of the model. > > Only a small increase in -Xmx (e.g. 350m) and the test passes. > > The test fails in the first pass over the files if it's going to fail. I > suspect that one or more internal systems have fixed size caches. Jena > does. JavaCC has expanding buffering (and you have some very large > literals). > > Jena's caches are bounded by number of slots so churning based on large > literals will need to settle down before any conclusions of a memory leak > can be made. Hence failing on the first pass is not suggestive of a memory > leak. This is backed up by the fact file order matters. > > JavaCC used by the old parser uses expanding buffers and your long literals > will force those larger and hence the runtime working space is higher on a > single file parse. RIOT uses a fixed size buffer and builds the large > literals directly into the string to be used as the RDF node. > > As increasing the heap means that the test runs and the test fails in the > first pass over the files if it is going to fail, I conclude it's various > caches filling up and just not fitting. I guess it passes at 256m with RIOT > by chance. Slightly less overhead meaning that caches just happen to fit. > > There is a streaming interface to RIOT in org.openjena.riot.RiotReader. > > Andy >
