I’m working on the initial benchmarking, and so far this arrangement (with synchronization and binary search for lookup, lots of barriers and linear cost insertion) has not yet been any slower.
I am nonetheless tempted by the 2-tables solution, because I think the simpler JVM-side interface that it allows is desirable. David On 2014-11-04, at 11:48 AM, Peter Levart <peter.lev...@gmail.com> wrote: > On 11/04/2014 04:19 PM, David Chase wrote: >> On 2014-11-04, at 5:07 AM, Peter Levart <peter.lev...@gmail.com> wrote: >>> Are you thinking of an IdentityHashMap type of hash table (no linked-list >>> of elements for same bucket, just search for 1st free slot on insert)? The >>> problem would be how to pre-size the array. Count declared members? >> It can’t be an identityHashMap, because we are interning member names. > > I know it can't be IdentityHashMap - I just wondered if you were thinking of > an IdentityHashMap-like data structure in contrast to standard HashMap-like. > Not in terms of equality/hashCode used, but in terms of internal data > structure. IdentityHashMap is just an array of elements (well pairs of them - > key, value are placed in two consecutive array slots). Lookup searches for > element linearly in the array starting from hashCode based index to the > element if found or 1st empty array slot. It's very easy to implement if the > only operations are get() and put() and could be used for interning and as a > shared structure for VM to scan, but array has to be sized to at least 3/2 > the number of elements for performance to not degrade. > >> In spite of my grumbling about benchmarking, I’m inclined to do that and try >> a couple of experiments. >> One possibility would be to use two data structures, one for interning, the >> other for communication with the VM. >> Because there’s no lookup in the VM data stucture it can just be an array >> that gets elements appended, >> and the synchronization dance is much simpler. >> >> For interning, maybe I use a ConcurrentHashMap, and I try the following >> idiom: >> >> mn = resolve(args) >> // deal with any errors >> mn’ = chm.get(mn) >> if (mn’ != null) return mn’ // hoped-for-common-case >> >> synchronized (something) { >> mn’ = chm.get(mn) >> if (mn’ != null) return mn’ >> txn_class = mn.getDeclaringClass() >> >> while (true) { >> redef_count = txn_class.redefCount() >> mn = resolve(args) >> >> shared_array.add(mn); >> // barrier, because we are a paranoid >> if (redef_count = redef_count.redefCount()) { >> chm.add(mn); // safe to publish to other Java threads. >> return mn; >> } >> shared_array.drop_last(); // Try again >> } >> } >> >> (Idiom gets slightly revised for the one or two other intern use cases, but >> this is the basic idea). > > Yes, that's similar to what I suggested by using a linked-list of > MemberName(s) instead of the "shared_array" (easier to reason about ordering > of writes) and a sorted array of MemberName(s) instead of the "chm" in your > scheme above. ConcurrentHashMap would certainly be the most performant > solution in terms of lookup/insertion-time and concurrent throughput, but it > will use more heap than a simple packed array of MemberNames. CHM is much > better now in JDK8 though regarding heap use. > > A combination of the two approaches is also possible: > > - instead of maintaining a "shared_array" of MemberName(s), have them form a > linked-list (you trade a slot in array for 'next' pointer in MemberName) > - use ConcurrentHashMap for interning. > > Regards, Peter > >> >> David >> >>>> And another way to view this is that we’re now quibbling about >>>> performance, when we still >>>> have an existing correctness problem that this patch solves, so maybe we >>>> should just get this >>>> done and then file an RFE. >>> Perhaps, yes. But note that questions about JMM and ordering of writes to >>> array elements are about correctness, not performance. >>> >>> Regards, Peter >>> >>>> David