Hi Nathan, The old fashion way of measuring the performance of such a small piece of code was to add up the clock cycles needed to execute each instruction. I’d agree that this is much more difficult for Java than in order run times but the problem here is that the measure is so very close to the advertised resolution of the clock that it’s likely dominated by noise. I saw advertised in that if the timer being used here doesn’t have nano second resolution, the measure is only noise.
Kind regards, Kirk > On Sep 23, 2017, at 4:29 AM, Nathan Fisher <[email protected]> wrote: > > Hi All, > > I'm doing some work to profile Clojure's start-up performance so I can figure > out where the most bang for buck improvements could be made. > > I wanted to verify that what I've done to instrument the call cost is > correct. I know benchmarking and am happy to take guidance from others. > > In one of the classes I've created the following static atomic fields: > > public static final AtomicLongArray latency = new AtomicLongArray(10000000); > public static final AtomicInteger index = new AtomicInteger(0); > > Where latency is a preallocated array and the index is used to get the next > available slot in that array. > > I'm instrumenting the call sites as follows: > > public static Var intern(Namespace ns, Symbol sym, Object root){ > long start = System.nanoTime(); > > // interesting work > Var v = intern(ns, sym, root, true); > > long finish = System.nanoTime(); > int i = RT.index.getAndIncrement(); // get next slot id > RT.latency.set(i, finish - start); // set the work time in the slot > > return v; > } > > Once the app shuts down I calculate the results as follows: > > int count = RT.index.get(); > System.out.println("Total Var.intern calls: " + count); > ArrayList<Long> al = new ArrayList<>(); > > for (int i = 0; i < count; i++) { > al.add(i, RT.latency.get(i)); > } > > List<Long> l = al.stream().sorted().collect(Collectors.toList()); > Long total = al.stream().reduce((a, b) -> a + b).get() / 1_000_000; > > double pctl99 = 0.99 * count; > double pctl50 = 0.5 * count; > > System.out.println( > " Min: " + l.get(0) + "ns," + > " Median: " + l.get((int)pctl50) + "ns," + > " 99PCTL: " + l.get((int)pctl99) + "ns," + > " Max: " + l.get(count - 1) / 1_000_000 + "ms," + > " Total: " + total + "ms"); > > An example run yields the following: > > Min: 162ns, Median: 907ns, 99PCTL: 30724ns, Max: 10ms, Total: 27ms > > Is this an adequate way to capture the cost of calling a specific function? > Is there a way to improve the measurements (unfortunately JMH wouldn't be > straightforward to use at this point)? > Is there any obvious pitfalls I'm overlooking with this approach (e.g. > out-of-order execution, etc)? > > Thanks in advance for any guidance. > > Nathan > > -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
