Uwe Schindler created LUCENE-4931:
-------------------------------------
Summary: Make oal.document.Field reuse its internal
StringTokenStream
Key: LUCENE-4931
URL: https://issues.apache.org/jira/browse/LUCENE-4931
Project: Lucene - Core
Issue Type: Bug
Components: core/index
Affects Versions: 4.2, 4.1, 4.0, 4.2.1
Reporter: Uwe Schindler
Followup from LUCENE-4930:
Field.java has a private StringTokenStream which is used as TokenStream
implementation for StringField (single value String tokens). Unfortunately this
TokenStream is created on every new document/field while indexing, making the
cost of creating the TS a significant time. With very old Java versions this
also involves a lock in ReferenceQueue.poll() when called from addAttribute().
In Lucene 3.x, DocInverterPerThread has a private thread-local AttributeSource
for reusing, but because this was factored out to Field.java, we can no longer
use CloseableThreadLocal (because Field are not Closeable). We should maybe
move the special One-Token TokenStream back to DocInverterPerThread and just
let Field.java delegate there. I know this would let us move back to 3.x where
we had special handling of single token Fields in the indexer....
Another approach would be to make Field.java use a static KeywordAnalyzer (it
needs then be moved to core) or we add a ThreadLocal to Field.java (which may
be expensive). Unfortunately this makes it hard to maintain, as the
thread-localness is also needed to be bound to the IndexWriter instance.
Because you could have 2 IndexWriters open at same time and add documents to
both of them from one thread... This brings us back to my previous solution.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]