On 05/08/11 19:01, Simon Helsen wrote:
I tested my stuff in mapped mode which did not show the problem, so the
issue I encountered is specific to direct mode. IMO the code below
contains the problem and needs to be fixed with a call to
blockMgrCache.getWrite (on the wrapped BlockMgr) whenever there is a
cache miss.
@Andy: could you fix this for the next build?
Done in SVN.
We've lost the repo (AWS ate the server) [temporarily] so for a bit,
it's "svn update; mvn clean package"
I still hit the OME though. I will try to analyze the stack dumps to see
if there is anything special. When I hit the OME, it comes very quick,
i.e. in a matter of seconds, my entire heap space is exhausted from
regular heap usage before.
So if I read the thread right, calling BulkUpdateHandlerTDB.removeAll
still goes bad sometimes.
Andy
Simon
Inactive hide details for Simon Helsen---08/05/2011 01:27:25 PM---Ok, so
I looked at the code in BlockMgrCache and I notice thaSimon
Helsen---08/05/2011 01:27:25 PM---Ok, so I looked at the code in
BlockMgrCache and I notice that getWrite is implemented like this:
From:
Simon Helsen/Toronto/IBM
To:
[email protected]
Cc:
[email protected]
Date:
08/05/2011 01:27 PM
Subject:
Re: testing TDB-Tx
------------------------------------------------------------------------
Ok, so I looked at the code in BlockMgrCache and I notice that getWrite
is implemented like this:
@Override
*synchronized*
*public*Block getWrite(*long*_id)
{
Long id = Long./valueOf/(_id) ;
Block blk = *null*;
*if*( writeCache!= *null*)
blk = writeCache.get(id) ;
*if*( blk != *null*)
{
cacheWriteHits++ ;
log("Hit(w->w) : %d", id) ;
*return*blk ;
}
// blk is null.
// A requested block may be in the other cache. Promote it.
*if*( readCache.containsKey(id) )
{
blk = readCache.get(id) ;
cacheReadHits++ ;
log("Hit(w->r) : %d", id) ;
blk = promote(blk) ;
*return*blk ;
}
// Did not find.
cacheMisses++ ;
log("Miss/w: %d", id) ;
*if*( writeCache!= *null*)
writeCache.put(id, blk) ;
*return*blk ;
}
Now, in my particular case, the id to come in is 0, but neither cache
contains the value. In this case, it will put the entry {0 = null} in
the write cache which necessarily leads to the NPE in the caller. So I
am not quite following the logic here. I would expect that if there is a
cache miss, the wrapped block mgr would be used to obtain block before
it is written in the writeCache.
Simon
Inactive hide details for Simon Helsen---08/05/2011 12:01:57 PM---Paolo,
I don't know who wrote the code, but it would help if Simon
Helsen---08/05/2011 12:01:57 PM---Paolo, I don't know who wrote the
code, but it would help if a first analysis is
From:
Simon Helsen/Toronto/IBM@IBMCA
To:
[email protected]
Cc:
[email protected]
Date:
08/05/2011 12:01 PM
Subject:
Re: testing TDB-Tx
------------------------------------------------------------------------
Paolo,
I don't know who wrote the code, but it would help if a first analysis is
done with the stack trace I provided and perhaps other questions that can
help identify the problem and a possible fix. Producing sharable code
which reproduces the problem is not trivial and may not even be possible
since we run in a rather complex framework. If possible, I will try to
debug myself from within our framework but obviously, I have limited
knowledge of the details of the PageBlockMgr.
All the instances of this stack trace (and I am seeing quite a few of
them) seem to come from BulkUpdateHandlerTDB.removeAll, but I know that
removeAll initially works fine (until the NPE occurs the first time - it
seems that after the first time, it keeps happening). I will also try to
isolate the problem more to see if there is anything specific that brings
the store in this situation
thanks
Simon
From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
08/05/2011 10:46 AM
Subject:
Re: testing TDB-Tx
Hi Simon,
I don't have an answer or a solution to your problem, but I want to thank
you for reporting your experience (and the problems you found) on
jena-dev.
It would be extremely helpful if you could reproduce the problem with some
sharable code we can run and debug. I know, I know... it's not always easy
nor possible.
I hit a problem using TestTransSystem.java which runs multiple threads and
it's not easy to replicate.
Thanks again and keep sharing on jena-dev, this way everybody can benefit.
Cheers,
Paolo
Simon Helsen wrote:
> Hi everyone,
>
> I am giving a first stab at integrating TDB-Tx into our framework. My
> first goal is to test this new TDB *without* actually using the
> transaction API because we are coming from TDB 0.8.7. After some minor
> problems on our end, I seem to run into the following NPE (usually
> followed by a warning)
>
> 09:49:02,176 [jazz.jfs.suspending.indexer.internal.triple] ERROR
> com.ibm.team.jfs - CRJZS5663E Unable
to
> persist tripe index
> java.lang.NullPointerException
> at com.hp.hpl.jena.tdb.base.page.PageBlockMgr.getWrite(
> PageBlockMgr.java:50)
> at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.getMgrWrite(
> BPTreeNode.java:162)
> at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.get(
> BPTreeNode.java:145)
> at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.delete(
> BPTreeNode.java:227)
> at
> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.deleteAndReturnOld(
> BPlusTree.java:324)
> at com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.delete(
> BPlusTree.java:318)
> at com.hp.hpl.jena.tdb.index.TupleIndexRecord.performDelete(
> TupleIndexRecord.java:55)
> at com.hp.hpl.jena.tdb.index.TupleIndexBase.delete(
> TupleIndexBase.java:61)
> at
com.hp.hpl.jena.tdb.index.TupleTable.delete(TupleTable.java:108
> )
> at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeWorker(
> BulkUpdateHandlerTDB.java:136)
> at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeAll(
> BulkUpdateHandlerTDB.java:90)
> at com.hp.hpl.jena.rdf.model.impl.ModelCom.removeAll(
> ModelCom.java:315)
> ...
> 09:49:02,207 [jazz.jfs.suspending.indexer.internal.triple] WARN
> com.hp.hpl.jena.tdb.base.block.BlockMgrCache - Write cache: 0
> expelling entry that isn't there
>
> The exception sits all over my log and I wonder if it is related to the
> removeAll. Also, after a while, my memory spikes and I run into an OME.
I
> don't know yet if there is a relation, but possible these exceptions
cause
> serious leaks.
>
> The version of TDB (and associated libs) I am using is
> tx-tdb-0.9.0-20110802.083904-6
>
> thanks,
>
> Simon