arne-bdt opened a new issue, #1912:
URL: https://github.com/apache/jena/issues/1912

   ### Version
   
   4.9.0.SNAPSHOT
   
   ### Feature
   
   New in-memory, general-purpose, non-transactional graphs as successors of 
GraphMem:
   All variants strictly use term-equality and do not support `Iterator#remove`.
   (GraphMem uses value-equality for object nodes)
   
   ### GraphMem2Legacy
   - Purpose: _Use this graph implementation if you want to maintain the 'old' 
behavior of GraphMem or if your memory constraints prevent you from utilizing 
more memory-intensive solutions._
   - Slightly improved performance compared to GraphMem
   - Simplified implementation, primarily due to lack of support for 
`Iterator#remove`
   - Inherits from GraphMem: 
      - Same basic structure
      - Same memory consumption
      - Also based on `HashCommon`
   
   ### GraphMem2Fast
   - Purpose: _GraphMem2Fast is a strong candidate for becoming the new default 
in-memory graph in the upcoming Jena 5, thanks to its improved performance and 
relatively minor increase in memory usage._
   - Faster than GraphMem2Legacy (specially Graph#add, Graph#find and 
Graph#stream)
   - Memory consumption is about 6-35% higher than GraphMem2Legacy
   - Maps and sets are not based on HashCommon, but use a faster custom 
alternative (only `#remove` is a bit slower)
   - Benefits from multiple small optimizations
   - Inherits from GraphMem
      - Also uses 3 hash-maps indexed by subjects, predicates, and objects
      - Values of the maps also switch from arrays to hash sets for the triples
   
   ### GraphMem2Roaring
   - Purpose: _GraphMem2Roaring is ideal for handling extremely large graphs. 
If you frequently work with such massive data structures, this implementation 
could be your top choice._ 
   - Graph#contains is faster than GraphMem2Fast
   - Better performance than GraphMem2Fast for operations with triple matches 
for the pattern S_O, SP_, and _PO on large graphs,
     due to bit-operations to find intersecting triples
   - Memory consumption is about 7-99% higher than GraphMem2Legacy
   - Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and 
possibly even larger
   - Simple and straightforward implementation
   - No inheritance from GraphMem
   - Internal structure:
     - One indexed hash set (same as GraphMem2Fast uses) that holds all triples
     - Three hash maps indexed by subjects, predicates, and objects with 
RoaringBitmaps as values
     - The bitmaps contain the indices of the triples in the central hash set
   -  _Thanks @afs for pointing me to RoaringBitmaps!_
   
   
   #### Memory consumption
   I measured the memory consumption before and after adding a previously 
loaded list of triples to the graph. Therefore, the measurement should only 
account for the additional memory used by the graph's indexing structures, not 
the space occupied by the triples themselves:
   
![image](https://github.com/apache/jena/assets/53625519/1e34e1c3-0f16-4ef4-804c-bfcf7b3b34f7)
   
   #### Graph#add
   
![image](https://github.com/apache/jena/assets/53625519/99732e02-a31b-4bbc-8533-810a21300e16)
   
   #### Graph#find ANY ANY ANY
   (Note: This is not a fair comparison with other 'find' patterns because 
GraphMem2Roaring uses a single set to return the results. Additionally, while 
GraphMem2Legacy has faster iterators, it does not offer faster streaming than 
GraphMem.)
   
![image](https://github.com/apache/jena/assets/53625519/0169346a-78f0-4d6a-bbc8-0c2e7c6c8d57)
   
   **Would these three graph implementations be appreciated by the Jena 
community?**
   (If so, I will still have to write some documentation and unit tests.)
   
   ### Are you interested in contributing a solution yourself?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to