btw my lucene 2.4 numbers for this corpus (running many times) average
around 41s versus 44s,
so its still a small hit even for reasonably large docs, using simple
analyzers with reuse and all that.

so reusableTokenStream takes care of a lot of it, but not all of it.
On Mon, Aug 10, 2009 at 10:48 AM, Mark Miller<markrmil...@gmail.com> wrote:
> Robert Muir wrote:
>>
>> This is real and not just for very short docs.
>
> Yes, you still pay the cost for longer docs, but it just becomes less
> important the longer the docs, as it plays a smaller role. Load a ton of one
> term docs, and it might be 50-60% slower - add a bunch of articles, and it
> might be closer to 20%-15% (I don't know the numbers, but the longer I made
> the docs, the less % slowdown, obviously). Still a good hit, but a short doc
> test magnafies the problem.
>
> It affects things no matter what, but when you don't do much tokenizing,
> normalizing, the cost of the reflection/tokenstream init dominates.
>
> - Mark
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to