This implementation suffers from thread visibility problems too - changes to the array's values aren't guaranteed to be visible across threads. In addition to that, there's also a problem with hash collisions invalidating cache entries which could greatly degrade performance in several common use cases. For example, suppose we had a nested loop iterating docs and the doc's field names, interning the names as we went. If two fields (F1, F2) both hashed to the same array index the cache would never be hit since we'd be alternating between interning F1 and F2. Without benchmarking/testing it's hard to know how big a problem that would be in practice, but the thread visibility problem seems potentially serious.

[
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.j
ira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-1607:
---------------------------------
Attachment: LUCENE-1607.patch

Here's a completely lockless and memory barrier free intern() cache.

This default would be more back compatible since programs may rely on
String instances being interned via String.intern().

It does not yet include corresponding Lucene code changes to use the
StringInterner.

Thoughts?

String.intern() faster alternative
----------------------------------
Key: LUCENE-1607
URL: https://issues.apache.org/jira/browse/LUCENE-1607
Project: Lucene - Java
Issue Type: Improvement
Reporter: Earwin Burrfoot
Fix For: 2.9
Attachments: intern.patch, LUCENE-1607.patch

By using our own interned string pool on top of default,
String.intern() can be greatly optimized.

On my setup (java 6) this alternative runs ~15.8x faster for already
interned strings, and ~2.2x faster for 'new String(interned)'

For java 5 and 4 speedup is lower, but still considerable.





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to