[
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987645#comment-13987645
]
Uwe Schindler commented on LUCENE-5634:
---------------------------------------
bq. but it's trickier since the precStep is final (maybe we can un-final it and
add a setter?)
Please donÄt do this. It is maybe better to do it like in Elasticsearch: Have a
pool of NTS for each precision step.
bq. this optimization has proven to help a lot in the context of ES, but we can
use a static thread local since we are fully in control of the threading model.
With Lucene itself, where it can be used in many different environment, then
this can cause some unexpected behavior. For example, this might cause Tomcat
to warn on leaking resources when unloading a war.
Thanks Shay: This is really the reason why we always refused to use static (!)
ThreadLocals in Lucene, especially for those heavy used components.
Maybe we can do a similar thing like with StringField in Mike's patch. Its a
bit crazy to move out the TokenStreams from the field, but we can do this for
performance here. Just have a lazy init pool of NumericTokenStreams for each
precisionStep in each per thread DocumentsWriter (DefaultIndexingChain).
-1 to add thread locals in Lucene here!
Another idea how to manage the pools: Maybe add a protected method to Field
that can get the DocumentsWriter instance and add some caching functionality
for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method
on the per thread DocumentsWriter to set aTokenStream for reuse per field. The
field (also custom ones) then could use
setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor
from inside the Field.
> Reuse TokenStream instances in Field
> ------------------------------------
>
> Key: LUCENE-5634
> URL: https://issues.apache.org/jira/browse/LUCENE-5634
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]