arne-bdt opened a new issue, #1912:
URL: https://github.com/apache/jena/issues/1912
### Version
4.9.0.SNAPSHOT
### Feature
New in-memory, general-purpose, non-transactional graphs as successors of
GraphMem:
All variants strictly use term-equality and do not support `Iterator#remove`.
(GraphMem uses value-equality for object nodes)
### GraphMem2Legacy
- Purpose: _Use this graph implementation if you want to maintain the 'old'
behavior of GraphMem or if your memory constraints prevent you from utilizing
more memory-intensive solutions._
- Slightly improved performance compared to GraphMem
- Simplified implementation, primarily due to lack of support for
`Iterator#remove`
- Inherits from GraphMem:
- Same basic structure
- Same memory consumption
- Also based on `HashCommon`
### GraphMem2Fast
- Purpose: _GraphMem2Fast is a strong candidate for becoming the new default
in-memory graph in the upcoming Jena 5, thanks to its improved performance and
relatively minor increase in memory usage._
- Faster than GraphMem2Legacy (specially Graph#add, Graph#find and
Graph#stream)
- Memory consumption is about 6-35% higher than GraphMem2Legacy
- Maps and sets are not based on HashCommon, but use a faster custom
alternative (only `#remove` is a bit slower)
- Benefits from multiple small optimizations
- Inherits from GraphMem
- Also uses 3 hash-maps indexed by subjects, predicates, and objects
- Values of the maps also switch from arrays to hash sets for the triples
### GraphMem2Roaring
- Purpose: _GraphMem2Roaring is ideal for handling extremely large graphs.
If you frequently work with such massive data structures, this implementation
could be your top choice._
- Graph#contains is faster than GraphMem2Fast
- Better performance than GraphMem2Fast for operations with triple matches
for the pattern S_O, SP_, and _PO on large graphs,
due to bit-operations to find intersecting triples
- Memory consumption is about 7-99% higher than GraphMem2Legacy
- Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and
possibly even larger
- Simple and straightforward implementation
- No inheritance from GraphMem
- Internal structure:
- One indexed hash set (same as GraphMem2Fast uses) that holds all triples
- Three hash maps indexed by subjects, predicates, and objects with
RoaringBitmaps as values
- The bitmaps contain the indices of the triples in the central hash set
- _Thanks @afs for pointing me to RoaringBitmaps!_
#### Memory consumption
I measured the memory consumption before and after adding a previously
loaded list of triples to the graph. Therefore, the measurement should only
account for the additional memory used by the graph's indexing structures, not
the space occupied by the triples themselves:

#### Graph#add

#### Graph#find ANY ANY ANY
(Note: This is not a fair comparison with other 'find' patterns because
GraphMem2Roaring uses a single set to return the results. Additionally, while
GraphMem2Legacy has faster iterators, it does not offer faster streaming than
GraphMem.)

**Would these three graph implementations be appreciated by the Jena
community?**
(If so, I will still have to write some documentation and unit tests.)
### Are you interested in contributing a solution yourself?
Yes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]