Re: Bet you didn't know Lucene can...

mark harwood Tue, 25 Oct 2011 08:27:12 -0700

>>using Lucene that don't fit under the core premise of full text search


 I've had several use cases over the years that use features peculiar to Lucene 
but here's a very simple one I came across today that illustrates its raw index 
lookup capability:

I needed a fast, scalable and persistent "Set" implementation to maintain a 
large cold-list (millions of string-based keys).
I benchmarked various implementations using a set of ~6 million keys with 
10,000 random key lookups.
When it comes to RAM use, retrieval times and start-up costs Lucene stands up 
very well against equivalent embedded databases for this task:

* Benchmarks for times to initially open the set when stored on disk:  
http://goo.gl/dJL3g
* Benchmarks for Avg key lookup time once opened: http://goo.gl/SG79N
* Stats for RAM use after 10,000 lookups: http://goo.gl/MyJDn

I don't doubt all of these implementations could be tweaked (e.g. optimizing 
the Lucene index, various DB-specific settings) but I tried to use sensible 
defaults to make the tests fair e.g. use of prepared statements, indexes, 
minimal data retrieved.
Speeds varied with each run of the random lookup test due to OS-level caching 
effects so the best times were recorded in each case.
The HashSet tests are loaded entirely from file (hence the long start-up time) 
and are not a scalable solution because of RAM costs.
MySQL requires an inter-process call as it was not  embedded but even using a 
remoted Lucene call I get significantly better performance (avg 0.5ms lookup vs 
MySQL 10ms)
 

Cheers
Mark



----- Original Message -----
From: Grant Ingersoll <[email protected]>
To: [email protected]
Cc: 
Sent: Saturday, 22 October 2011, 10:11
Subject: Bet you didn't know Lucene can...

Hi All,

I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." 
(http://na11.apachecon.com/talks/18396).  It's based on my observation, that 
over the years, a number of us in the community have done some pretty cool 
things using Lucene that don't fit under the core premise of full text search.  
I've got a fair number of ideas for the talk (easily enough for 1 hour), but I 
wanted to reach out to hear your stories of ways you've (ab)used Lucene and 
Solr to see if we couldn't extend the conversation to a bit more than the 
conference and also see if I can't inject more ideas beyond the ones I have.  I 
don't need deep technical details, but just high level use case and the basic 
insight that led you to believe Lucene could solve the problem.

Thanks in advance,
Grant

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Bet you didn't know Lucene can...

Reply via email to