On Tue, Aug 11, 2009 at 6:50 AM, Robert Muir<rcm...@gmail.com> wrote: > On Tue, Aug 11, 2009 at 4:28 AM, Michael Busch<busch...@gmail.com> wrote: >> There was a performance test in Solr that apparently ran much slower >> after upgrading to the new Lucene jar. This test is testing a rather >> uncommon scenario: very very short documents. > > Actually, its more uncommon than that: its very very short documents, > without implementing reusableTokenStream() > this makes it basically a benchmark of ctor cost... doesn't really > benchmark the token api in my opinion.
You would be surprized... there are quite a few Solr users that have relatively short documents... or even if they are sizeable documents, they have up to hundreds of short metadata-type fields (generally a token or two). Reusing TokenStreams has become a must in Solr IMO since construction costs (hashmap lookups, etc) and GC costs (larger objects) have been growing. I'm focused on that now... Robert's taking a crack at fixing things up so users can actually create reusable analyzers out of our filters: https://issues.apache.org/jira/browse/LUCENE-1794 -Yonik http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org