[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987645#comment-13987645
 ] 

Uwe Schindler commented on LUCENE-5634:
---------------------------------------

bq. but it's trickier since the precStep is final (maybe we can un-final it and 
add a setter?)

Please donÄt do this. It is maybe better to do it like in Elasticsearch: Have a 
pool of NTS for each precision step.

bq. this optimization has proven to help a lot in the context of ES, but we can 
use a static thread local since we are fully in control of the threading model. 
With Lucene itself, where it can be used in many different environment, then 
this can cause some unexpected behavior. For example, this might cause Tomcat 
to warn on leaking resources when unloading a war.

Thanks Shay: This is really the reason why we always refused to use static (!) 
ThreadLocals in Lucene, especially for those heavy used components.

Maybe we can do a similar thing like with StringField in Mike's patch. Its a 
bit crazy to move out the TokenStreams from the field, but we can do this for 
performance here. Just have a lazy init pool of NumericTokenStreams for each 
precisionStep in each per thread DocumentsWriter (DefaultIndexingChain).

-1 to add thread locals in Lucene here!

Another idea how to manage the pools: Maybe add a protected method to Field 
that can get the DocumentsWriter instance and add some caching functionality 
for arbitrary TokenStreams (not just NumericTS or StringTS): Maybe some method 
on the per thread DocumentsWriter to set aTokenStream for reuse per field. The 
field (also custom ones) then could use 
setCachedTokenStream/getCachedTokenStream through the DocumentsWriter accessor 
from inside the Field.

> Reuse TokenStream instances in Field
> ------------------------------------
>
>                 Key: LUCENE-5634
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5634
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.9, 5.0
>
>         Attachments: LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to