Cool!

On Sat, Mar 24, 2012 at 4:17 PM, Mark Harwood <[email protected]> wrote:
> OK I have some code and benchmarks for this solution up on a Google Code 
> project here: http://code.google.com/p/graphdb-load-tester/
>
> The project exists to address the performance challenges I have encountered 
> when dealing with large graphs. It  uses all of the Wikipedia links as a test 
> dataset and a choice of graph databases (most of which use Lucene BTW).
> The test data is essentially 130 million edges representing links between 
> pages e.g.  Communism->Russia.
> To load the data all of the graph databases have to translate user-defined 
> keys like "Russia" into an internally-generated node ID using a service that 
> looks like this:
>        interface KeyService
>        {
>                //Returns existing nodeid or -1 if is not already in store
>                public long getGraphNodeId(String udk);
>
>                //Adds a new record - assumption is client has checked user 
> defined key (udk) is not stored already using getGraphNodeId
>                public void put(String udk, long graphNodeId);
>        }
>
> This is a challenge on a dataset of this size. I tried using a Lucene-based 
> implementation for this service with the following optimisations:
> 1) a Bloomfilter to quickly "know what we don't know"
> 2) an LRUCache to hold on to commonly referenced vertices e.g the Wikipdedia 
> article for "United States"
> 3) a hashmap representing the unflushed state of Lucene's IndexWriter to 
> avoid the need for excessive flushing with NRT reader etc
>
> The search/write performance showed the familiar saw-toothing as the Lucene 
> index grew in size and merge operations kicked in.
>
> The KVStore implementation I wrote attempts to tackle this problem using a 
> fundamentally different form of index. The results from the KVStore runs show 
> it was twice as fast as this  Lucene solution and maintains constant 
> performance without the saw toothing effect.
>
> Benchmark figures are here: http://goo.gl/VQ027
> The KVStore source code is here: http://goo.gl/ovkop and the Lucene 
> implementation I compare against is also in the project.
>
> Cheers
> Mark
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>



-- 
Lance Norskog
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to