This is real and not just for very short docs. The reflection overhead
is pretty expensive I think.
here are some stats from the hamshari corpus (i have been trec testing
persian just to ensure everything is ok)

SimpleAnalyzer: (has reusableTokenStream)
Total time: 47816 ms
Unique tokens: 441660

PersianAnalyzer (no reuse):
Total time: 53928 ms
Unique tokens: 438286

PersianAnalyzer (with reusableTokenStream)
Total time: 47704 ms
Unique tokens: 438286

On Mon, Aug 10, 2009 at 10:35 AM, Mark Miller<[email protected]> wrote:
> Discussion on speed of new TokenStream API in Solr.
>
> see:
> http://search.lucidimagination.com/search/document/d0040ebe6addad4b/indexing_slowdown_with_latest_lucene_udpate
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>



-- 
Robert Muir
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to