Andy Seaborne schreef op 29-11-2013 9:39:
On 28/11/13 13:17, Minto van der Sluis wrote:
Hi,

I just ran into some peculiar behavior.

For my current project I have to import 633 files each containing approx 20 MB of xml data (a total of 13 GB). When importing this data into a single graph I hit an out of memory exception on the 7th file.

Looking at the heap I noticed that after restarting the application I could load a few more files. So I started looking for the bundle that consumed all the memory. It happened to be the Clerezza TDB Storage provider. See the following image (GC = garbage collection):




Looking more closely I noticed that Apache Jena is able to close a graph (graph.close()) But Clerezza is not using this feature and is keeping the graph open all the time.

Jena graphs backed by TDB are simply views of the dataset - they don't have any state associated with them directly.  If the reference become inaccessible, GC should clean up.
Hi Andy,

The problem, as far as I can tell, is not in Jena TDB itself. The Jena TDB bundle is still active/running. Only the Clerezza TDB Provider bundle is stopped (by me). Like my image shows a normal GC does not release all of the memory. Only after stopping the Clerezza TDB Provider memory allocated for importing is release. Because of stopping this particular bundle all jena datastructures become inaccessible and eligible for GC. Just like the image shows.

My reasoning is that since the Clerezza TDB Provider has a map with weak references to Jena models these references are never properly garbage collected. Since I use the same graph all the time all data gets accumulated and resulting in out of memory. Looking at a memory dump, most space is occupied by byte arrays containing the imported data.

I use a nasty hack to prevent this dreaded out of memory. After every import I restart the Clerezza TDB Provider bundle programmatically (hail OSGI for I wouldn't know how to do this without OSGI). Like this I have been able to import more that 300 files in a row (still running).

Regards,

Minto



Reply via email to