[ 
https://issues.apache.org/jira/browse/LUCENE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5634:
---------------------------------------

    Attachment: LUCENE-5634.patch

Here's an alternate approach, pulled from early iterations on LUCENE-5611, to 
specialize indexing just a single string ... there are still nocommits, it 
needs to be more generic to any not tokenized field, etc.  It's sort of silly 
to build up an entire TokenStream when really you just need to index the one 
token ...

This patch indexes geonames in ~38.5 sec, or ~31% faster than trunk

> Reuse TokenStream instances in Field
> ------------------------------------
>
>                 Key: LUCENE-5634
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5634
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.9, 5.0
>
>         Attachments: LUCENE-5634.patch, LUCENE-5634.patch
>
>
> If you don't reuse your Doc/Field instances (which is very expert: I
> suspect few apps do) then there's a lot of garbage created to index each
> StringField because we make a new StringTokenStream or
> NumericTokenStream (and their Attributes).
> We should be able to re-use these instances via a static
> ThreadLocal...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to