Re: RFR: 8007806: Need a Throwables performance counter

Peter Levart Sun, 24 Feb 2013 12:19:27 -0800

Hi Alan, David, Nils,

I just want to clear something regarding PerfCounter implementation.

Access to 64bit value in native memory is through a direct buffer whichuses normal read/write (non-volatile, Unsafe.[get|set]Long). So onprocessors that don't support atomic 64bit stores/loads, each accessresults in two separate 32bit load/store accesses right?

The PerfCounter methods that access the 64bit value are synchronized,using PerfCounter instance as a lock. But how is this 64bit valueaccessed for example in the jstat utility? Is it possible that jstat cansee one half of the 64bit value before the update and the other halfafter the update?

If this is true and it is not that important, then instead of asynchronized update of 64bit counter, a 32bit CAS could be used,optionally (rarely) followed by a second 32bit CAS, like for example:


http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html

I tried this on ARM v6 and it works much better than synchronizedaccess, but I don't know if it's acceptable. It guarantees eventualcorrectness of summed value if the only operation performed is add() (noset() intermingled) and has the same possibility of incorrect half-halfreads by observers as current PerfCounter has for unsynchronized observers.

Here's the comparison of unpatched/patched PerfCounter.increment()micro-benchmark on single-core ARM v6 (Raspbery-PI):


*** Original PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
           1 threads, Tavg =    269.34 ns/op (? =   0.00 ns/op) [   269.34]

2 threads, Tavg = 7,170.48 ns/op (? = 410.77 ns/op) [6,783.73, 7,603.95]3 threads, Tavg = 12,034.82 ns/op (? = 418.99 ns/op)[11,792.33, 11,714.67, 12,639.26]4 threads, Tavg = 16,029.76 ns/op (? = 1,411.44 ns/op)[15,592.04, 18,511.52, 15,642.52, 14,818.16]



*** Patched PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
           1 threads, Tavg =    166.21 ns/op (? =   0.00 ns/op) [   166.21]

2 threads, Tavg = 332.58 ns/op (? = 0.12 ns/op) [332.45, 332.70]3 threads, Tavg = 500.30 ns/op (? = 0.22 ns/op) [500.04, 500.29, 500.58]4 threads, Tavg = 667.95 ns/op (? = 2.11 ns/op) [665.22, 667.18, 668.40, 671.04]



Regards, Peter


On 02/24/2013 11:31 AM, David Holmes wrote:

On 24/02/2013 6:50 PM, Peter Levart wrote:

Hi David,

I thought it was ok to pass null, but I don't know the "portability"
issues in-depth. The javadoc for Unsafe says:

/"This method refers to a variable by means of two parameters, and so it
provides (in effect) a double-register addressing mode for Java
variables. When the object reference is null, this method uses its
offset as an absolute address. This is similar in operation to methods
such as getInt(long), which provide (in effect) a single-register
addressing mode for non-Java variables. However, because Java variables
may have a different layout in memory from non-Java variables,
programmers should not assume that these two addressing modes are ever
equivalent. Also, programmers should remember that offsets from the
double-register addressing mode cannot be portably confused with longs
used in the single-register addressing mode."/

That is the doc for getXXX but not for getAndAddXXX orcompareAndSwapXXX. You can't have null here:

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobjectunsafe, jobject obj, jlong offset, jlong e, jlong x))

  UnsafeWrapper("Unsafe_CompareAndSwapLong");
  Handle p (THREAD, JNIHandles::resolve(obj));
  jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), offset));
  if (VM_Version::supports_cx8())
    return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
  else {
    jboolean success = false;
    ObjectLocker ol(p, THREAD);
    if (*addr == e) { *addr = x; success = true; }
    return success;
  }
UNSAFE_END

David
-----

Does anybody know the in-depth interpretation of the above? Is it only
the particular Java/native type differences (for example, endianess of
variables) that these two addressing modes might interpret differently
or something else too?

Regards, Peter


On 02/24/2013 12:39 AM, David Holmes wrote:

Peter,

In your use of Unsafe you pass "null" as the object. I'm pretty
certain you can't pass null here. Unsafe operates on fields or array
elements.

David

On 24/02/2013 5:39 AM, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads, there
might be contention/scalability issues. Instead of synchronization on
updates, you might consider using atomic updates provided by
sun.misc.Unsafe, like for example:


Index: jdk/src/share/classes/sun/misc/PerfCounter.java
===================================================================
--- jdk/src/share/classes/sun/misc/PerfCounter.java
+++ jdk/src/share/classes/sun/misc/PerfCounter.java
@@ -25,6 +25,8 @@

  package sun.misc;

+import sun.nio.ch.DirectBuffer;
+
  import java.nio.ByteBuffer;
  import java.nio.ByteOrder;
  import java.nio.LongBuffer;
@@ -50,6 +52,8 @@
  public class PerfCounter {
      private static final Perf perf =
          AccessController.doPrivileged(new Perf.GetPerfAction());
+    private static final Unsafe unsafe =
+        Unsafe.getUnsafe();

      // Must match values defined in
hotspot/src/share/vm/runtime/perfdata.hpp
      private final static int V_Constant  = 1;
@@ -59,12 +63,14 @@

      private final String name;
      private final LongBuffer lb;
+    private final DirectBuffer db;

      private PerfCounter(String name, int type) {
          this.name = name;
          ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
          bb.order(ByteOrder.nativeOrder());
          this.lb = bb.asLongBuffer();
+        this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb :
null;
      }

      static PerfCounter newPerfCounter(String name) {
@@ -79,23 +85,44 @@
      /**
       * Returns the current value of the perf counter.
       */
-    public synchronized long get() {
+    public long get() {
+        if (db != null) {
+            return unsafe.getLongVolatile(null, db.address());
+        }
+        else {
+            synchronized (this) {
-        return lb.get(0);
-    }
+                return lb.get(0);
+            }
+        }
+    }

      /**
       * Sets the value of the perf counter to the given newValue.
       */
-    public synchronized void set(long newValue) {
+    public void set(long newValue) {
+        if (db != null) {
+            unsafe.putOrderedLong(null, db.address(), newValue);
+        }
+        else {
+            synchronized (this) {
-        lb.put(0, newValue);
-    }
+                lb.put(0, newValue);
+            }
+        }
+    }

      /**
       * Adds the given value to the perf counter.
       */
-    public synchronized void add(long value) {
-        long res = get() + value;
+    public void add(long value) {
+        if (db != null) {
+            unsafe.getAndAddLong(null, db.address(), value);
+        }
+        else {
+            synchronized (this) {
+                long res = lb.get(0) + value;
-        lb.put(0, res);
+                lb.put(0, res);
+            }
+        }
      }

      /**

Testing the PerfCounter.increment() method in a loop on multiplethreadssharing the same PerfCounter instance, for example, on a 4-coreIntel i7

machine produces the following results:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 8
#
            1 threads, Tavg =     19.02 ns/op (? =   0.00 ns/op)
            2 threads, Tavg =    109.93 ns/op (? =   6.17 ns/op)
            3 threads, Tavg =    136.64 ns/op (? =   2.99 ns/op)
            4 threads, Tavg =    293.26 ns/op (? =   5.30 ns/op)
            5 threads, Tavg =    316.94 ns/op (? =   6.28 ns/op)
            6 threads, Tavg =    686.96 ns/op (? =   7.09 ns/op)
            7 threads, Tavg =    793.28 ns/op (? =  10.57 ns/op)
            8 threads, Tavg =    898.15 ns/op (? =  14.63 ns/op)


With the presented patch, the results are a little better:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 8
#
# Measure:
            1 threads, Tavg =      5.22 ns/op (? =   0.00 ns/op)
            2 threads, Tavg =     34.51 ns/op (? =   0.60 ns/op)
            3 threads, Tavg =     54.85 ns/op (? =   1.42 ns/op)
            4 threads, Tavg =     74.67 ns/op (? =   1.71 ns/op)
            5 threads, Tavg =     94.71 ns/op (? =  41.68 ns/op)
            6 threads, Tavg =    114.80 ns/op (? =  32.10 ns/op)
            7 threads, Tavg =    136.70 ns/op (? =  26.80 ns/op)
            8 threads, Tavg =    158.48 ns/op (? =   9.93 ns/op)


The scalability is not much better, but the raw speed is, so it might

present less contention when used in real-world code. If you wantedeven

better scalability, there is a new class in JDK8, the
java.util.concurrent.LongAdder. But that doesn't buy atomic "set()" -
only "add()". And it can't update native-memory variables, so it could

only be used for add-only counters and in conjunction with abackground

thread that would periodically flush the sum to the native memory....

Regards, Peter


On 02/08/2013 06:10 PM, Nils Loodin wrote:

It would be interesting to know the number of thrown throwables inthe

JVM, to be able to do some high level application diagnostics /
statistics. A good way to put this number would be a performance
counter, since it is accessible both from Java and from the VM.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/

Regards,
Nils Loodin

Re: RFR: 8007806: Need a Throwables performance counter

Reply via email to