> Avg lookup time slightly less than a HashSet? Interesting.

Yep, HashSet comparison was a surprise to me too. I threw it in as a datapoint 
for what I thought would be the fastest option on the example dataset but 
clearly not a long-term answer to my problem as it costs so much in RAM. 
Lucene started out at an avg 3ms but subsequent runs took it down dramatically 
due to OS file caching. The all-in-memory hashset implementation clearly did 
not demonstrate the same speed ups between runs.

> Is the code
> to these benchmarks available somewhere?


I can make the code available but the data wouldn't be possible.
The English Wikipedia page titles are probably an equivalent size and shape so 
I could try and package something up around that as a benchmarking tool for 
others to play with. 

Cheers
Mark

On 25 Oct 2011, at 22:47, Dawid Weiss wrote:

> Avg lookup time slightly less than a HashSet? Interesting. Is the code
> to these benchmarks available somewhere?
> 
> Dawid
> 
> On Tue, Oct 25, 2011 at 9:57 PM, Grant Ingersoll <gsing...@apache.org> wrote:
>> 
>> On Oct 25, 2011, at 11:26 AM, mark harwood wrote:
>> 
>>>>> using Lucene that don't fit under the core premise of full text search
>>> 
>>>  I've had several use cases over the years that use features peculiar to 
>>> Lucene but here's a very simple one I came across today that illustrates 
>>> its raw index lookup capability:
>>> 
>>> I needed a fast, scalable and persistent "Set" implementation to maintain a 
>>> large cold-list (millions of string-based keys).
>>> I benchmarked various implementations using a set of ~6 million keys with 
>>> 10,000 random key lookups.
>>> When it comes to RAM use, retrieval times and start-up costs Lucene stands 
>>> up very well against equivalent embedded databases for this task:
>>> 
>>> * Benchmarks for times to initially open the set when stored on disk:  
>>> http://goo.gl/dJL3g
>>> * Benchmarks for Avg key lookup time once opened: http://goo.gl/SG79N
>>> * Stats for RAM use after 10,000 lookups: http://goo.gl/MyJDn
>> 
>> Those charts are beautiful.  I have Lucene/Solr down as an excellent 
>> key-value store (I've seen this done many times) and these charts further 
>> cement it.
>> 
>>> 
>>> I don't doubt all of these implementations could be tweaked (e.g. 
>>> optimizing the Lucene index, various DB-specific settings) but I tried to 
>>> use sensible defaults to make the tests fair e.g. use of prepared 
>>> statements, indexes, minimal data retrieved.
>>> Speeds varied with each run of the random lookup test due to OS-level 
>>> caching effects so the best times were recorded in each case.
>>> The HashSet tests are loaded entirely from file (hence the long start-up 
>>> time) and are not a scalable solution because of RAM costs.
>>> MySQL requires an inter-process call as it was not  embedded but even using 
>>> a remoted Lucene call I get significantly better performance (avg 0.5ms 
>>> lookup vs MySQL 10ms)
>>> 
>>> 
>>> Cheers
>>> Mark
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Grant Ingersoll <gsing...@apache.org>
>>> To: java-user@lucene.apache.org
>>> Cc:
>>> Sent: Saturday, 22 October 2011, 10:11
>>> Subject: Bet you didn't know Lucene can...
>>> 
>>> Hi All,
>>> 
>>> I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." 
>>> (http://na11.apachecon.com/talks/18396).  It's based on my observation, 
>>> that over the years, a number of us in the community have done some pretty 
>>> cool things using Lucene that don't fit under the core premise of full text 
>>> search.  I've got a fair number of ideas for the talk (easily enough for 1 
>>> hour), but I wanted to reach out to hear your stories of ways you've 
>>> (ab)used Lucene and Solr to see if we couldn't extend the conversation to a 
>>> bit more than the conference and also see if I can't inject more ideas 
>>> beyond the ones I have.  I don't need deep technical details, but just high 
>>> level use case and the basic insight that led you to believe Lucene could 
>>> solve the problem.
>>> 
>>> Thanks in advance,
>>> Grant
>>> 
>>> --------------------------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>> 
>> 
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to