Hi Andy, It appears that there is a subtle race condition in how the ThreadBufferingCache coordinates write access to the underlying storage. In debugging this, I made the following changes https://github.com/apache/jena/compare/feature/debug_concurrent_writes Effectively, I added a Semaphore object to the class. With that change, my test suite passes (this is the Trellis codebase, in case you are wondering). I also checked for any significant changes to the performance from 3.13.1 to 3.14.0 (with this change in place), and after running the tests repeatedly, I didn't see any appreciable performance change one way or the other. If this seems like a reasonable adjustment to that class, I can write up a JIRA issue and submit this as a PR.
Best, Aaron On Wed, 15 Jan 2020 at 14:25, Aaron Coburn <[email protected]> wrote: > Hi Andy, > I'll dig a little deeper into what's going on and will put together a > reproducible test case for this. I first wanted to find out if it might be > something obvious. > > Thanks, > Aaron > > On Wed, 15 Jan 2020 at 13:44, Andy Seaborne <[email protected]> wrote: > >> Hi Aaron, >> >> Could you say some more about how the concurrent writes are happening >> and what they are doing? Just from the stacktrace I haven't managed to >> write a test case. >> >> My guess is that another transaction is finishing a commit about the >> same time. But if the other transaction is mid-processing then its >> something else. >> >> If you are able to putting in a JVM-suspend breakpoint at >> ThreadBufferingCache:88 and capture a thread dump, that would be very >> helpful - I realise it's not always easy to get up. >> >> Andy >> >> >> >> On 15/01/2020 16:55, Aaron Coburn wrote: >> > This might be good to split off into a separate issue (and it doesn't >> > necessarily need to block the release), but I'm finding that, when using >> > TDB2 with this release candidate in a concurrent write context, I start >> > encountering a lot of errors. And those errors are definitely not >> present >> > with 3.13.1. Specifically, the issue seems to be related to contention >> over >> > the TDB2 ThreadBufferingCache. That buffering cache is present in the >> > 3.13.1 release, and I'm not entirely sure what changed with 3.14.0 that >> > would trigger these errors, but this is the relevant part of the stack >> > trace: >> > >> > Caused by: org.apache.jena.tdb2.TDBException: ThreadBufferingCache: >> already >> > buffering >> > at >> > >> org.apache.jena.tdb2.store.nodetable.ThreadBufferingCache.enableBuffering(ThreadBufferingCache.java:88) >> > at >> > >> org.apache.jena.tdb2.store.nodetable.NodeTableCache.updateStart(NodeTableCache.java:352) >> > at >> > >> org.apache.jena.tdb2.store.nodetable.NodeTableCache.notifyTxnStart(NodeTableCache.java:319) >> > at >> > >> org.apache.jena.dboe.transaction.txn.TransactionCoordinator.lambda$notifyBegin$14(TransactionCoordinator.java:915) >> > at >> > >> org.apache.jena.dboe.transaction.txn.TransactionCoordinator.lambda$listeners$0(TransactionCoordinator.java:207) >> > at java.base/java.util.ArrayList.forEach(ArrayList.java:1540) >> > at >> > >> org.apache.jena.dboe.transaction.txn.TransactionCoordinator.listeners(TransactionCoordinator.java:207) >> > at >> > >> org.apache.jena.dboe.transaction.txn.TransactionCoordinator.notifyBegin(TransactionCoordinator.java:915) >> > at >> > >> org.apache.jena.dboe.transaction.txn.TransactionCoordinator.begin(TransactionCoordinator.java:553) >> > at >> > >> org.apache.jena.dboe.transaction.txn.TransactionCoordinator.begin(TransactionCoordinator.java:509) >> > at >> > >> org.apache.jena.dboe.transaction.txn.TransactionalBase.begin(TransactionalBase.java:110) >> > at >> > >> org.apache.jena.dboe.storage.system.DatasetGraphStorage.begin(DatasetGraphStorage.java:59) >> > at >> > >> org.apache.jena.sparql.core.DatasetGraphWrapper.begin(DatasetGraphWrapper.java:233) >> > at org.apache.jena.sparql.core.DatasetImpl.begin(DatasetImpl.java:116) >> > at org.apache.jena.system.Txn.exec(Txn.java:76) >> > at org.apache.jena.system.Txn.executeWrite(Txn.java:125) >> > at >> > >> org.apache.jena.rdfconnection.RDFConnectionLocal.update(RDFConnectionLocal.java:80) >> > >> > Effectively, I get that error at the first time a client attempts to >> > concurrently write to the TDB2 store. Subsequent attempts just hang. >> > >> > Cheers, >> > Aaron >> > >> > >> > >> > >> > >> > On Mon, 13 Jan 2020 at 11:23, Andy Seaborne <[email protected]> wrote: >> > >> >> Hi, >> >> >> >> Here is a vote on the release of Apache Jena 3.14.0 >> >> This is the first proposed release candidate. >> >> >> >> ==== Changes: >> >> >> >> https://s.apache.org/jena-3.14.0-jira >> >> >> >> ==== Release Vote >> >> >> >> Everyone, not just committers, is invited to test and vote. >> >> Please download and test the proposed release. >> >> >> >> Staging repository: >> >> >> https://repository.apache.org/content/repositories/orgapachejena-1035 >> >> >> >> Proposed dist/ area: >> >> https://dist.apache.org/repos/dist/dev/jena/ >> >> >> >> Keys: >> >> https://svn.apache.org/repos/asf/jena/dist/KEYS >> >> >> >> Git commit (browser URL): >> >> https://github.com/apache/jena/commit/19d42a5 >> >> >> >> Git Commit Hash: >> >> 19d42a57a9debc675047b2d1ce9769979c43e7d8 >> >> >> >> Git Commit Tag: >> >> jena-3.14.0 >> >> >> >> Please vote to approve this release: >> >> >> >> [ ] +1 Approve the release >> >> [ ] 0 Don't care >> >> [ ] -1 Don't release, because ... >> >> >> >> This vote will be open until at least >> >> >> >> Thursday, 16th January 2020 at 187:00 UTC >> >> >> >> If you expect to check the release but the time limit does not work >> >> for you, please email within the schedule above with an expected time >> >> and we can extend the vote period. >> >> >> >> Thanks, >> >> >> >> Andy >> >> >> >> Checking needed: >> >> >> >> + are the GPG signatures fine? >> >> + are the checksums correct? >> >> + is there a source archive? >> >> >> >> + can the source archive really be built? >> >> (NB This requires a "mvn install" first time) >> >> + is there a correct LICENSE and NOTICE file in each artifact >> >> (both source and binary artifacts)? >> >> + does the NOTICE file contain all necessary attributions? >> >> + have any licenses of dependencies changed due to upgrades? >> >> if so have LICENSE and NOTICE been upgraded appropriately? >> >> + does the tag/commit in the SCM contain reproducible sources? >> >> >> > >> >
