Okay, we'd like to have equality-by-reference for field names, yielding überfast comparisions in all our tight inner loops. But we dislike default String.intern() for its java<->native transitions and general lentitude. There's a perfect solution. Too dumb to come up with it myself, but fortunately spotted it in google-collections mailing list.
> Internally, we have a thing called an Interner. It has an intern() > method that works just like String.intern() except doesn't use permgen > and works for any type. It can use strong, weak or soft references. Yay! With exception of weak/soft reference variety that's exactly what we need. Unlucky for us, it is not yet public and we can't grab the code right away. > It won't come out too soon because there are some changes happening to > CustomConcurrentHashMap which it rests on; and because we're just putting > all our effort into stabilizing right now, not features. 'Custom' there stands for supporting soft/weak keys/values, so we can roll our own using stock ConcurrentHashMap. Ah, too sad, we're not Java5 yet. But! Nobody prevents us from taking good ol' HashMap, making a static instance and using copy-on-write semantics with synchronization happening only on addition to the map. It will actually be faster when you're trying to intern strings that are already in the pool, and slower when you try to intern something new. Well, if you constantly try to intern new strings, you should really worry of something different from performance, like hitting an OOM. I did benchmarks for the proper use case, when you're interning strings that are already in the pool. Running benchmark with 10000000 rounds SunT = 1545ms, MyT = 98ms SunNoninternT = 1731ms, MyNoninternT = 792ms First run is trying to intern() a string that is already interned, e.g. a constant. That is, most probably, what happens, when field names come from inside VM. Second run is trying to intern() a string created via new String(constant). That is, most probably, what happens, when field names come from outside VM, like remote invocations. Or you're generating field names on-the-fly. Sic! In contrast to previous case, String.hashCode() is not cached and is calculated for each invocation, plus String.equals() doesn't short-circuit on reference equality. Also this run allocates a bajillion of strings on heap and does array copies. I alleviated the first problem with big enough heap to avoid GC. Should I make a patch? -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org