Avg lookup time slightly less than a HashSet? Interesting. Is the code to these benchmarks available somewhere?
Dawid On Tue, Oct 25, 2011 at 9:57 PM, Grant Ingersoll <gsing...@apache.org> wrote: > > On Oct 25, 2011, at 11:26 AM, mark harwood wrote: > >>>> using Lucene that don't fit under the core premise of full text search >> >> I've had several use cases over the years that use features peculiar to >> Lucene but here's a very simple one I came across today that illustrates its >> raw index lookup capability: >> >> I needed a fast, scalable and persistent "Set" implementation to maintain a >> large cold-list (millions of string-based keys). >> I benchmarked various implementations using a set of ~6 million keys with >> 10,000 random key lookups. >> When it comes to RAM use, retrieval times and start-up costs Lucene stands >> up very well against equivalent embedded databases for this task: >> >> * Benchmarks for times to initially open the set when stored on disk: >> http://goo.gl/dJL3g >> * Benchmarks for Avg key lookup time once opened: http://goo.gl/SG79N >> * Stats for RAM use after 10,000 lookups: http://goo.gl/MyJDn > > Those charts are beautiful. I have Lucene/Solr down as an excellent > key-value store (I've seen this done many times) and these charts further > cement it. > >> >> I don't doubt all of these implementations could be tweaked (e.g. optimizing >> the Lucene index, various DB-specific settings) but I tried to use sensible >> defaults to make the tests fair e.g. use of prepared statements, indexes, >> minimal data retrieved. >> Speeds varied with each run of the random lookup test due to OS-level >> caching effects so the best times were recorded in each case. >> The HashSet tests are loaded entirely from file (hence the long start-up >> time) and are not a scalable solution because of RAM costs. >> MySQL requires an inter-process call as it was not embedded but even using >> a remoted Lucene call I get significantly better performance (avg 0.5ms >> lookup vs MySQL 10ms) >> >> >> Cheers >> Mark >> >> >> >> ----- Original Message ----- >> From: Grant Ingersoll <gsing...@apache.org> >> To: java-user@lucene.apache.org >> Cc: >> Sent: Saturday, 22 October 2011, 10:11 >> Subject: Bet you didn't know Lucene can... >> >> Hi All, >> >> I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." >> (http://na11.apachecon.com/talks/18396). It's based on my observation, that >> over the years, a number of us in the community have done some pretty cool >> things using Lucene that don't fit under the core premise of full text >> search. I've got a fair number of ideas for the talk (easily enough for 1 >> hour), but I wanted to reach out to hear your stories of ways you've >> (ab)used Lucene and Solr to see if we couldn't extend the conversation to a >> bit more than the conference and also see if I can't inject more ideas >> beyond the ones I have. I don't need deep technical details, but just high >> level use case and the basic insight that led you to believe Lucene could >> solve the problem. >> >> Thanks in advance, >> Grant >> >> -------------------------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org