On 28/03/2011 09:17, Andy Seaborne wrote:
[...]
If the TDB default event handler then did something (like sync to disk?)
that the memory model does not, this could explain the difference in
performance. I have put a profiler on the test program and it reports
that the test program is spending a lot more time in
BlockManagerFile.force() when it is reading directly in to TDB than when
it is going via a memory model. So there is some evidence that this is
what is happening.
I haven't been able to track down the block manager code actually in use
as I'm having trouble checking ARQ out of SVN, but Andy likely knows off
the top of his head whether this is plausible.
> s/block manager/event manager/
Could be - the model.add(model) will go via the BulkUpdateHandler (I
think). TDB's BulkUpdateHandler inherits from SimpleBulkUpdateHandler
for insertion.
Yes. The event is not issued by the update handler but by ARP.
Could you try putting a break point in dataset.sync and see what the
call stack is when it gets hit? That'll tell you who is causing the
sync.
Done. ARP issues the finishRead event. This leads to
com.hp.hpl.jena.tdb.graph.GraphSyncListener.finishRead() which does a
sync. Something is(was) attaching a GraphSyncListener to the event
manager for TDB graphs.
There used to be (up to v 0.8.9? not int he last snapshot build) a
sync wrapper that sync() every n'000 triples added.
I think it is ARP issuing the finishRead event that is the trigger for
the sync.
It's not in the development codebase. All hidden implicit syncs
should now be removed They were causing problems for a user who was
tracking whether the DB on disk was dirty or not.
Brian, Frank, which versions are you running?
I've been using the latest from the main Maven repository:
tbd: 0..8.9
arq: 2.8.7
jena: 2.6.4
I've checked with the latest from CVS/SVN taken today. That does not
do the sync call and is faster when the parser is reading directly into
the TDB dataset. So this issue already fixed in head version.
Brian