btw my lucene 2.4 numbers for this corpus (running many times) average around 41s versus 44s, so its still a small hit even for reasonably large docs, using simple analyzers with reuse and all that.
so reusableTokenStream takes care of a lot of it, but not all of it. On Mon, Aug 10, 2009 at 10:48 AM, Mark Miller<markrmil...@gmail.com> wrote: > Robert Muir wrote: >> >> This is real and not just for very short docs. > > Yes, you still pay the cost for longer docs, but it just becomes less > important the longer the docs, as it plays a smaller role. Load a ton of one > term docs, and it might be 50-60% slower - add a bunch of articles, and it > might be closer to 20%-15% (I don't know the numbers, but the longer I made > the docs, the less % slowdown, obviously). Still a good hit, but a short doc > test magnafies the problem. > > It affects things no matter what, but when you don't do much tokenizing, > normalizing, the cost of the reflection/tokenstream init dominates. > > - Mark > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org