Cool! On Sat, Mar 24, 2012 at 4:17 PM, Mark Harwood <[email protected]> wrote: > OK I have some code and benchmarks for this solution up on a Google Code > project here: http://code.google.com/p/graphdb-load-tester/ > > The project exists to address the performance challenges I have encountered > when dealing with large graphs. It uses all of the Wikipedia links as a test > dataset and a choice of graph databases (most of which use Lucene BTW). > The test data is essentially 130 million edges representing links between > pages e.g. Communism->Russia. > To load the data all of the graph databases have to translate user-defined > keys like "Russia" into an internally-generated node ID using a service that > looks like this: > interface KeyService > { > //Returns existing nodeid or -1 if is not already in store > public long getGraphNodeId(String udk); > > //Adds a new record - assumption is client has checked user > defined key (udk) is not stored already using getGraphNodeId > public void put(String udk, long graphNodeId); > } > > This is a challenge on a dataset of this size. I tried using a Lucene-based > implementation for this service with the following optimisations: > 1) a Bloomfilter to quickly "know what we don't know" > 2) an LRUCache to hold on to commonly referenced vertices e.g the Wikipdedia > article for "United States" > 3) a hashmap representing the unflushed state of Lucene's IndexWriter to > avoid the need for excessive flushing with NRT reader etc > > The search/write performance showed the familiar saw-toothing as the Lucene > index grew in size and merge operations kicked in. > > The KVStore implementation I wrote attempts to tackle this problem using a > fundamentally different form of index. The results from the KVStore runs show > it was twice as fast as this Lucene solution and maintains constant > performance without the saw toothing effect. > > Benchmark figures are here: http://goo.gl/VQ027 > The KVStore source code is here: http://goo.gl/ovkop and the Lucene > implementation I compare against is also in the project. > > Cheers > Mark > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
-- Lance Norskog [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
