[ 
https://issues.apache.org/jira/browse/JENA-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065138#comment-17065138
 ] 

Andy Seaborne commented on JENA-1868:
-------------------------------------

Hi Bernhard, thanks for the report and stress tests.

I can replicate the NPE and occasionally get the stackoverflow though I more 
often that not I get the NPE in that stackoverflow version as well.

I need to study the code to remind myself how it works.

Initial impression:

The NPE is because {{getTransaction()}}, which is the code {{threadTxn.get()}} 
returns null, not that {{threadTxn}} itself is null. 

The proposed volatiles are on fields that only get set when the dataset is 
created and not change subsequently.

But there may be an effect of using "volatile" which is to change the thread 
order during the test proper because a volatile read is a "happens-before" 
barrier which is cross all hardware threads of a CPU core.

I'm running Java11 as the JRE on a quad core, single CPU system.

TDB1 and TDB2 are quite different in approach to transactions, including the 
B+Trees. TDB2 - and the DBOE transaction system - is more principled.  The 
B+Trees are MVCC are persistent datastructures ("persistent" here means 
immutable, not referring to b on-disk or not).

Using TDB2Factory.createDataset() (the testing in-memory form) also goes wrong 
in the RAM disk layer. The in-memory ("TIM") implementation of dataset works 
fine, as does TDB1 in-memory  - so it is not the tests, it is the TDB2 code and 
unliekly to be the in-memory block layer (TDB1 and TDB2 are virstually the 
same).

It may be a single concurrency bug.



 

> TDB2 Concurrency: NPE in TransactionalComponentLifecycle
> --------------------------------------------------------
>
>                 Key: JENA-1868
>                 URL: https://issues.apache.org/jira/browse/JENA-1868
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB2
>    Affects Versions: Jena 3.14.0
>            Reporter: Bernhard Stiftner
>            Priority: Major
>         Attachments: TDB2StressTest.java
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We're evaluating moving from TDB1 to TDB2 and are hitting various 
> concurrency/thread-safety issues that apparently didn't exist with TDB1.
> Our setting is as follows: one JVM, ~20 independent TDB1/TDB2 instances, 
> highly concurrent workload involving every TDB1/TDB2 instance.
> A common issue we're hitting with TDB2 is this NullPointerException in 
> TransactionalComponentLifecycle:
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.getReadWriteMode(TransactionalComponentLifecycle.java:324)
>     at 
> org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.complete(TransactionalComponentLifecycle.java:143)
>     at 
> org.apache.jena.dboe.transaction.txn.SysTrans.complete(SysTrans.java:47)
>     at 
> org.apache.jena.dboe.transaction.txn.Transaction.lambda$endInternal$16(Transaction.java:220)
>     at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
>     at 
> org.apache.jena.dboe.transaction.txn.Transaction.endInternal(Transaction.java:220)
>     at 
> org.apache.jena.dboe.transaction.txn.Transaction.end(Transaction.java:209)
>     at 
> org.apache.jena.dboe.transaction.txn.TransactionalBase._end(TransactionalBase.java:262)
>     at 
> org.apache.jena.dboe.transaction.txn.TransactionalBase.abort(TransactionalBase.java:159)
>     at 
> org.apache.jena.dboe.storage.system.DatasetGraphStorage.abort(DatasetGraphStorage.java:63)
>     at 
> org.apache.jena.sparql.core.DatasetGraphWrapper.abort(DatasetGraphWrapper.java:253)
>     at org.apache.jena.sparql.core.DatasetImpl.abort(DatasetImpl.java:158)
>     at TDB2StressTest.randomRead(TDB2StressTest.java:87)
>     at TDB2StressTest.runStressTestWorker(TDB2StressTest.java:64)
>     at TDB2StressTest.lambda$runStressTest$0(TDB2StressTest.java:43)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}
> The attached "test case" manages to reproduce this issue most of the time on 
> my machine (YMMV of course, since the test is based on quite some concurrency 
> voodoo).
> The same test is working flawlessly when run against a TDB1 backend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to