afs commented on issue #646: JENA-1785: A newly created node can remain 
invisible after commit
URL: https://github.com/apache/jena/pull/646#issuecomment-562860832
 
 
   Firstly - thank you very much for working on this and the discussion. The 
more eyes on concurrency and transaction issues the better.
   
   This approach looks like a good one.
   I think it can be simplified and remove some lock use. 
   
   1/
   The principle that when there is a writer, only the writer can update the 
not-present cache is good. When there isn't a writer, any reader can update the 
not-present cache and there is no need to track the data version that I can 
see. This is because, when there is no writer, the readers can see the whole 
node table. If they find a node/nodeid but it isn't in the RDF data, all that 
happens is that use in a triple pattern fails. The not-present cache just 
speeds that up; that cache don't have to be very large either.
   
   So the rule is either its the active writer, or any reader and no writer 
active.
   
   ```
    private boolean inTopLevelTxn() {
           Transaction t = txn.get();
           if ( t == null )
               return false;
           // Either t is the write transaction or is a reader and no W txn is 
active. 
           if ( t.isWriteTxn() ) 
               return true;
           return !hasActiveWriteTransaction.get(); 
       }
   ```
   
   2/
   
   That means `hasActiveWriteTransaction` can be a `AtomicBoolean`, is managed 
in `updateStart`/`updateCommit`/`updateAbort`. This removes the need for 
synchronized because only one value is needed.
   
   I've made these changes for discussion on a scratch branch:
   
   https://github.com/apache/jena/compare/master...afs:jena1785_tdb2_misscache
   
   Two commits - the first is PR646, the second is the suggestions above - 
commit message "AFS-PR646" and currently commit 
https://github.com/apache/jena/commit/d7063a3ad5aa2751a57b8e3356b2fee122f45595
   
   3/
   There are changes in the test suite to build a test dataset with smaller 
cache sizes. Then the caches can be fully cycled in a reasonable time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to