Also you may want to have a look at JMH [1] and a presentation by Aleksey Shipilev on it
Shripad [1] : http://openjdk.java.net/projects/code-tools/jmh/ [2]: https://www.youtube.com/watch?v=VaWgOCDBxYw On Monday, September 25, 2017 at 1:18:40 AM UTC+5:30, Nathan Fisher wrote: > > Hi Peter, > > Apologies everyone if I'm polluting the mailing list it's not the typical > latency question. > > Thanks Peter inline below for answers to your questions. > > On Sun, 24 Sep 2017 at 17:45, Peter Booth <[email protected] <javascript:>> > wrote: > >> >> >> Nathan, >> >> >> >> You mentioned that it was clojure startup time that you want to improve. >> Is it a general "all clojure apps" issue or "our clojure apps?" >> > > NF> Not just me, all Clojure apps. It takes about 700ms to load > clojure.core from a fat jar and execute "(+ 1 2 3)" (e.g. 1 + 2 + 3). (see > https://dev.clojure.org/display/design/Improving+Clojure+Start+Time) > > >> What are typical times for the entire startup that you observe? What do >> the clojure apps actually do? >> > NF> Typically 10s of seconds except for new/small projects which are > single-digit in seconds. > > >> >> Some points: >> >> >> >> *Precision/noise:* >> >> As Kirk described, calling System.nanoTime() costs about 28 nanos on a >> one year old Haswell CPU. It just doesn't work to use it to measure >> operations that themselves take tens or hundreds of nanos. >> > NF> Thanks for the clarification I wasn't actually sure how long the > methods were taking but it did give me insight to look elsewhere. Naively > can I assume the approach is a useable albeit crude technique that could be > applied where the latency is much larger (e.g. > 100us)? I was considering > using a dynamic proxy with that kind of instrumentation to collect data but > static methods present that. I also looked at AOP but the site was down. > > >> >> *Skewing * >> >> Martin Thompson alluded to how measurement can skew behavior of the >> underlying system. JMH can’t avoid the Heisenberg effect. Perf-map reduces >> Heisenberg cost because you are tracing from outside the process (but still >> on the host). Taking measurements out-of-band is the only way I know to >> avoid Heisenberg >> >> >> > NF> Yes I figured this would be an issue. I was instrumenting one method > at a time so that only affected the caller and not the callee I was > measuring. It was enough to identify that the method I was measuring at the > time of the original e-mail might not yield a huge benefit. The attached > Flame Graph generated with perf-map is what I was able to generate for the > (+ 1 2 3) example. My possible mis-interpretation of the Flame Graph is > that a significant amount of time is being spent in loading the classes and > interpreting the byte-code (e.g. "Interpreter" is both wide and deep on the > call stacks). When started there are around 2000 classes loaded. So I've > started looking into seeing what about the class loading is slow. Some > thoughts so far are: > > - zip compression level (0 appears to save 40-80ms, which is similar > savings as when loaded from disk). > - class load ordering (e.g. would loading based on a dependency graph > help? would automatically loading a class from the jar as it's streamed > past help? etc). > - static field/execution blocks (Clojure employs heavy use of static > initialisation and fields that could be deferred to after start-up in a > dev > scenario). > - custom class loader (less inclined for this as it introduces another > dependency to "get started"). > > *Host issues* >> >> When you said "spin up a linux box" did you mean a physical box, not a VM >> or container? >> > NF> VM it's not something where I'm aiming to achieve us performance and a > smooth latency curve rather just want to scratch an itch and see if I can > make some improvements. > > >> I've had a bunch of consulting projects that were different variations on >> “performance issues that only occur in environment X or on hardware Y”. It >> common for people to assume “performance is relative. If this is a hotspot >> here it will be a hotspot here” >> >> >> >> All of the points described here require that you have root access to >> physical hosts that are representative of your target hardware. In larger >> (and some small) shops this isn’t always easy to get. >> >> On Saturday, September 23, 2017 at 10:51:52 AM UTC-4, Nathan Fisher wrote: >> >>> Thank-you I ran across an article by Brandon Gregg and was just starting >>> to dig into honest-profiler. Looks like I'll spin up a linux box instead to >>> use perf-map-agent. >>> >>> >>> http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hprof.html >>> >>> >>> On Sat, 23 Sep 2017 at 14:58 Martin Thompson <[email protected]> wrote: >>> >>>> This approach to measurement is likely to skew the results. I'd start >>>> with perf record via perf-map-agent and then use flame graphs. >>>> >>>> >>>> http://psy-lob-saw.blogspot.co.uk/2017/02/flamegraphs-intro-fire-for-everyone.html >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "mechanical-sympathy" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >>> - sent from my mobile >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "mechanical-sympathy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- > - sent from my mobile > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
