I've been looking at Graph Databases recently (neo4j, OrientDb, InfiniteGraph)
as a faster alternative to relational stores. I notice they either embed Lucene
for indexing node properties or (in the case of OrientDB) are talking about
doing this.
I think their fundamental performance advantage over relational stores is that
they don't have to de-reference foreign keys in a b-tree index to get from a
source node to a target node. Instead they use internally-generated IDs to act
like pointers with more-or-less direct references between nodes/vertexes. As a
result they can follow links very quickly. This got me thinking could Lucene
adopt the idea of creating links between documents that were equally fast using
Lucene doc ids?
Maybe the user API would look something like this...
indexWriter.addLink(fromDocId, toDocId);
DocIdSet reader.getInboundLinks(docId);
DocIdSet reader.getOutboundLinks(docId);
Internally a new index file structure would be needed to record link info. Any
recorded links that connect documents from different segments would need
careful
adjustment of referenced link IDs when segments merge and Lucene doc IDs are
shuffled.
As well as handling typical graphs (social networks, web data) this could
potentially be used to support tagging operations where apps could create "tag"
documents and then link them to existing documents that are being tagged
without
having to update the target doc. There are probably a ton of applications for
this stuff.
I see the Graph DBs busy recreating transactional support, indexes, segment
merging etc and it seems to me that Lucene has a pretty good head start with
this stuff.
Anyone else think this might be an area worth exploring?
Cheers
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]