This implementation suffers from thread visibility problems too - changes
to the array's values aren't guaranteed to be visible across threads. In
addition to that, there's also a problem with hash collisions invalidating
cache entries which could greatly degrade performance in several common use
cases. For example, suppose we had a nested loop iterating docs and the doc's
field names, interning the names as we went. If two fields (F1, F2) both
hashed to the same array index the cache would never be hit since we'd be
alternating between interning F1 and F2. Without benchmarking/testing it's
hard to know how big a problem that would be in practice, but the thread
visibility problem seems potentially serious.
[
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.j
ira.plugin.system.issuetabpanels:all-tabpanel ]
Yonik Seeley updated LUCENE-1607:
---------------------------------
Attachment: LUCENE-1607.patch
Here's a completely lockless and memory barrier free intern() cache.
This default would be more back compatible since programs may rely on
String instances being interned via String.intern().
It does not yet include corresponding Lucene code changes to use the
StringInterner.
Thoughts?
String.intern() faster alternative
----------------------------------
Key: LUCENE-1607
URL: https://issues.apache.org/jira/browse/LUCENE-1607
Project: Lucene - Java
Issue Type: Improvement
Reporter: Earwin Burrfoot
Fix For: 2.9
Attachments: intern.patch, LUCENE-1607.patch
By using our own interned string pool on top of default,
String.intern() can be greatly optimized.
On my setup (java 6) this alternative runs ~15.8x faster for already
interned strings, and ~2.2x faster for 'new String(interned)'
For java 5 and 4 speedup is lower, but still considerable.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org