Re: testing TDB-Tx

Andy Seaborne Mon, 08 Aug 2011 08:16:05 -0700


On 05/08/11 19:01, Simon Helsen wrote:

I tested my stuff in mapped mode which did not show the problem, so the
issue I encountered is specific to direct mode. IMO the code below
contains the problem and needs to be fixed with a call to
blockMgrCache.getWrite (on the wrapped BlockMgr) whenever there is a
cache miss.

@Andy: could you fix this for the next build?


Done in SVN.

We've lost the repo (AWS ate the server) [temporarily] so for a bit,it's "svn update; mvn clean package"

I still hit the OME though. I will try to analyze the stack dumps to see
if there is anything special. When I hit the OME, it comes very quick,
i.e. in a matter of seconds, my entire heap space is exhausted from
regular heap usage before.

So if I read the thread right, calling BulkUpdateHandlerTDB.removeAllstill goes bad sometimes.


        Andy

Simon

Inactive hide details for Simon Helsen---08/05/2011 01:27:25 PM---Ok, so
I looked at the code in BlockMgrCache and I notice thaSimon
Helsen---08/05/2011 01:27:25 PM---Ok, so I looked at the code in
BlockMgrCache and I notice that getWrite is implemented like this:

From:   
Simon Helsen/Toronto/IBM

To:     
[email protected]

Cc:     
[email protected]

Date:   
08/05/2011 01:27 PM

Subject:        
Re: testing TDB-Tx

------------------------------------------------------------------------

Ok, so I looked at the code in BlockMgrCache and I notice that getWrite
is implemented like this:

@Override
*synchronized*
*public*Block getWrite(*long*_id)
{
Long id = Long./valueOf/(_id) ;
Block blk = *null*;
*if*( writeCache!= *null*)
blk = writeCache.get(id) ;
*if*( blk != *null*)
{
cacheWriteHits++ ;
log("Hit(w->w) : %d", id) ;
*return*blk ;
}

// blk is null.
// A requested block may be in the other cache. Promote it.

*if*( readCache.containsKey(id) )
{
blk = readCache.get(id) ;
cacheReadHits++ ;
log("Hit(w->r) : %d", id) ;
blk = promote(blk) ;
*return*blk ;
}

// Did not find.
cacheMisses++ ;
log("Miss/w: %d", id) ;
*if*( writeCache!= *null*)
writeCache.put(id, blk) ;
*return*blk ;
}

Now, in my particular case, the id to come in is 0, but neither cache
contains the value. In this case, it will put the entry {0 = null} in
the write cache which necessarily leads to the NPE in the caller. So I
am not quite following the logic here. I would expect that if there is a
cache miss, the wrapped block mgr would be used to obtain block before
it is written in the writeCache.

Simon

Inactive hide details for Simon Helsen---08/05/2011 12:01:57 PM---Paolo,
I don't know who wrote the code, but it would help if Simon
Helsen---08/05/2011 12:01:57 PM---Paolo, I don't know who wrote the
code, but it would help if a first analysis is

From:   
Simon Helsen/Toronto/IBM@IBMCA

To:     
[email protected]

Cc:     
[email protected]

Date:   
08/05/2011 12:01 PM

Subject:        
Re: testing TDB-Tx

------------------------------------------------------------------------

Paolo,

I don't know who wrote the code, but it would help if a first analysis is
done with the stack trace I provided and perhaps other questions that can
help identify the problem and a possible fix. Producing sharable code
which reproduces the problem is not trivial and may not even be possible
since we run in a rather complex framework. If possible, I will try to
debug myself from within our framework but obviously, I have limited
knowledge of the details of the PageBlockMgr.

All the instances of this stack trace (and I am seeing quite a few of
them) seem to come from BulkUpdateHandlerTDB.removeAll, but I know that
removeAll initially works fine (until the NPE occurs the first time - it
seems that after the first time, it keeps happening). I will also try to
isolate the problem more to see if there is anything specific that brings
the store in this situation

thanks

Simon

From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
08/05/2011 10:46 AM
Subject:
Re: testing TDB-Tx

Hi Simon,
I don't have an answer or a solution to your problem, but I want to thank
you for reporting your experience (and the problems you found) on
jena-dev.

It would be extremely helpful if you could reproduce the problem with some
sharable code we can run and debug. I know, I know... it's not always easy
nor possible.

I hit a problem using TestTransSystem.java which runs multiple threads and
it's not easy to replicate.

Thanks again and keep sharing on jena-dev, this way everybody can benefit.

Cheers,
Paolo

Simon Helsen wrote:
 > Hi everyone,
 >
 > I am giving a first stab at integrating TDB-Tx into our framework. My
 > first goal is to test this new TDB *without* actually using the
 > transaction API because we are coming from TDB 0.8.7. After some minor
 > problems on our end, I seem to run into the following NPE (usually
 > followed by a warning)
 >
 > 09:49:02,176 [jazz.jfs.suspending.indexer.internal.triple] ERROR
 > com.ibm.team.jfs - CRJZS5663E Unable
to
 > persist tripe index
 > java.lang.NullPointerException
 > at com.hp.hpl.jena.tdb.base.page.PageBlockMgr.getWrite(
 > PageBlockMgr.java:50)
 > at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.getMgrWrite(
 > BPTreeNode.java:162)
 > at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.get(
 > BPTreeNode.java:145)
 > at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.delete(
 > BPTreeNode.java:227)
 > at
 > com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.deleteAndReturnOld(
 > BPlusTree.java:324)
 > at com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.delete(
 > BPlusTree.java:318)
 > at com.hp.hpl.jena.tdb.index.TupleIndexRecord.performDelete(
 > TupleIndexRecord.java:55)
 > at com.hp.hpl.jena.tdb.index.TupleIndexBase.delete(
 > TupleIndexBase.java:61)
 > at
com.hp.hpl.jena.tdb.index.TupleTable.delete(TupleTable.java:108
 > )
 > at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeWorker(
 > BulkUpdateHandlerTDB.java:136)
 > at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeAll(
 > BulkUpdateHandlerTDB.java:90)
 > at com.hp.hpl.jena.rdf.model.impl.ModelCom.removeAll(
 > ModelCom.java:315)
 > ...
 > 09:49:02,207 [jazz.jfs.suspending.indexer.internal.triple] WARN
 > com.hp.hpl.jena.tdb.base.block.BlockMgrCache - Write cache: 0
 > expelling entry that isn't there
 >
 > The exception sits all over my log and I wonder if it is related to the
 > removeAll. Also, after a while, my memory spikes and I run into an OME.
I
 > don't know yet if there is a relation, but possible these exceptions
cause
 > serious leaks.
 >
 > The version of TDB (and associated libs) I am using is
 > tx-tdb-0.9.0-20110802.083904-6
 >
 > thanks,
 >
 > Simon

Re: testing TDB-Tx

Reply via email to