Avg lookup time slightly less than a HashSet? Interesting. Is the code
to these benchmarks available somewhere?

Dawid

On Tue, Oct 25, 2011 at 9:57 PM, Grant Ingersoll <gsing...@apache.org> wrote:
>
> On Oct 25, 2011, at 11:26 AM, mark harwood wrote:
>
>>>> using Lucene that don't fit under the core premise of full text search
>>
>>  I've had several use cases over the years that use features peculiar to 
>> Lucene but here's a very simple one I came across today that illustrates its 
>> raw index lookup capability:
>>
>> I needed a fast, scalable and persistent "Set" implementation to maintain a 
>> large cold-list (millions of string-based keys).
>> I benchmarked various implementations using a set of ~6 million keys with 
>> 10,000 random key lookups.
>> When it comes to RAM use, retrieval times and start-up costs Lucene stands 
>> up very well against equivalent embedded databases for this task:
>>
>> * Benchmarks for times to initially open the set when stored on disk:  
>> http://goo.gl/dJL3g
>> * Benchmarks for Avg key lookup time once opened: http://goo.gl/SG79N
>> * Stats for RAM use after 10,000 lookups: http://goo.gl/MyJDn
>
> Those charts are beautiful.  I have Lucene/Solr down as an excellent 
> key-value store (I've seen this done many times) and these charts further 
> cement it.
>
>>
>> I don't doubt all of these implementations could be tweaked (e.g. optimizing 
>> the Lucene index, various DB-specific settings) but I tried to use sensible 
>> defaults to make the tests fair e.g. use of prepared statements, indexes, 
>> minimal data retrieved.
>> Speeds varied with each run of the random lookup test due to OS-level 
>> caching effects so the best times were recorded in each case.
>> The HashSet tests are loaded entirely from file (hence the long start-up 
>> time) and are not a scalable solution because of RAM costs.
>> MySQL requires an inter-process call as it was not  embedded but even using 
>> a remoted Lucene call I get significantly better performance (avg 0.5ms 
>> lookup vs MySQL 10ms)
>>
>>
>> Cheers
>> Mark
>>
>>
>>
>> ----- Original Message -----
>> From: Grant Ingersoll <gsing...@apache.org>
>> To: java-user@lucene.apache.org
>> Cc:
>> Sent: Saturday, 22 October 2011, 10:11
>> Subject: Bet you didn't know Lucene can...
>>
>> Hi All,
>>
>> I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." 
>> (http://na11.apachecon.com/talks/18396).  It's based on my observation, that 
>> over the years, a number of us in the community have done some pretty cool 
>> things using Lucene that don't fit under the core premise of full text 
>> search.  I've got a fair number of ideas for the talk (easily enough for 1 
>> hour), but I wanted to reach out to hear your stories of ways you've 
>> (ab)used Lucene and Solr to see if we couldn't extend the conversation to a 
>> bit more than the conference and also see if I can't inject more ideas 
>> beyond the ones I have.  I don't need deep technical details, but just high 
>> level use case and the basic insight that led you to believe Lucene could 
>> solve the problem.
>>
>> Thanks in advance,
>> Grant
>>
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to