[
https://issues.apache.org/jira/browse/LUCENE-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507094
]
Mark Miller commented on LUCENE-937:
------------------------------------
> So it may be safe to say that if you can estimate the list size (avoiding
> array grow), AL is preferable if there's no add/remove not at the end.
In the CachingTokenFilter case I don't even believe it is really necessary to
estimate the list size. Many of the documents I used had way more than 30
tokens, but initializing the Array larger gave no benefits. I believe this is
because the ArrayList doubles each time it grows (not guaranteed, but how it is
implemented), and so a small increase in size can dramatically lower the number
of resizes needed even when the List must grow *much* bigger than the init
size. 10 just doesn't cut it, but 30 works great. A LinkedList (iterator or
get()) seems to perform no better than an ArrayList(10).
- Mark
> Make CachingTokenFilter faster
> ------------------------------
>
> Key: LUCENE-937
> URL: https://issues.apache.org/jira/browse/LUCENE-937
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Mark Miller
> Priority: Minor
> Attachments: CachingTokenFilter.patch
>
>
> The wrong data structure was used for the CachingTokenFilter. It should be an
> ArrayList rather than a LinkedList. There is a noticeable difference in speed.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]