arne-bdt commented on issue #1279:
URL: https://github.com/apache/jena/issues/1279#issuecomment-1566270071

   Thanks, that's the information I needed to proceed.
   
   Adding and removing the array with the hashes is straightforward. It might 
also be quite simple to have two variants of the graph store. That way, users 
can choose between the fast variant or the low memory one.
   
   Speaking of which:
   There are two fields tracking changes for potential concurrent modifications:
   1. `org.apache.jena.mem.ArrayBunch#changes`
   2. `org.apache.jena.mem.HashCommon#changes`
   
   The one in `ArrayBunch` is volatile, but the one in `HashCommon` is not. 
`ArrayBunch` is totally fine with integer overflow, since it checks for 
equality when verifying: 
   `if (changes != initialChanges) throw new ConcurrentModificationException();`
   
   HashCommon, however, wouldn't handle an integer overflow correctly: 
   `if (changes > initialChanges) throw new ConcurrentModificationException();`
   
   The document header of `java.util.HashMap` states:
   ` * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
    * as it is, generally speaking, impossible to make any hard guarantees in 
the
    * presence of unsynchronized concurrent modification.  Fail-fast iterators
    * throw {@code ConcurrentModificationException} on a best-effort basis.
    * Therefore, it would be wrong to write a program that depended on this
    * exception for its correctness: <i>the fail-fast behavior of iterators
    * should be used only to detect bugs.</i>`
   
   Tracking the changes here might be the safest possible way. However, 
`HashCommon` should then also make `changes` volatile and compare for equality 
rather than using a greater than comparison.
   
   On the other hand, if we removed both fields and checked 
`org.apache.jena.mem.HashCommon#size` and `org.apache.jena.mem.ArrayBunch#size` 
instead, we would save memory (since there are many instances of both classes) 
and avoid incrementing a second variable on `#add` and `#remove` operations. 
The iterators should still help detect concurrent modifications in almost any 
scenario. Only if there are an equal number of `#add` and `#remove` calls, so 
that the total size of the graph is not affected, could there be undetected 
concurrent modifications while executing an iterator. This seems unlikely 
enough to me that it would prevent anyone from detecting a concurrency bug.
   What is your opinion in this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to