Looks like a simple fix. Unfortunately don¹t know enough Clojure to fix it.
Narrowed down the performance issue to this Clojure code in executor.clj : (defn mk-custom-grouper [^CustomStreamGrouping grouping ^WorkerTopologyContext context ^String component-id ^String stream-id target-tasks] (.prepare grouping context (GlobalStreamId. component-id stream-id) target-tasks) (if (instance? LoadAwareCustomStreamGrouping grouping) (fn. [task-id ^List values load] (.chooseTasks grouping task-id values load)) ; <‹‹ problematic invocation (fn [task-id ^List values load] (.chooseTasks grouping task-id values)))) Œgrouping' is statically typed to the base type CustomStreamGrouping. In this run, its actual type is the derived type LoadAwareCustomStreamGrouping. The base type does not have a chooseTasks() method with 3 args. Only the derived type has that method. Consequently clojure falls back to dynamically iterating over the methods in the Œgrouping' object to locate the right method & then invoke it appropriately. This falls in the critical path SpoutOutputCollector.emit() where it takes about ~20% time .. just to find the right method. I tried a few things, but was unable to force as cast to LoadAwareCustomStreamGrouping there or enable more efficient dispatching. If anyone knows how to fix it, I can try it and rerun the numbers. Since it appears to be an easy fix, we can do this w/o waiting for CLJ-1784 or replacement of clojure subsystem. -roshan On 2/3/16, 12:03 AM, "Abhishek Agarwal" <abhishc...@gmail.com> wrote: >Thanks for sharing. This is very helpful. >Regarding the reflection cost, it seems there is already a ticket open in >clojure. >http://dev.clojure.org/jira/browse/CLJ-1784 > >In the discussion thread, its been suggested to use warn_on_reflection ><https://clojuredocs.org/clojure.core/*warn-on-reflection*> property and >use type hints. I am new to clojure so I can't say exactly how it will >work >out. > >Second one could be an indicator of the another problem. The function you >have cited, is called in consumer path. It means messages are not flowing >fast enough compared to consumers. This behavior is coupled with load >pattern and topology parameters such as queue size. At what rate, are you >generating the load and what is the size of disruptor queue? Also If your >spout is slower compared to the bolts, this behavior is very much >expected. >Isn't it? > >On Wed, Feb 3, 2016 at 11:54 AM, Roshan Naik <ros...@hortonworks.com> >wrote: > >> Looks like the attachments were stripped off. So resending with links >>to >> profiler screenshots. >> >> Call tree: >> >>https://github.com/roshannaik/storm-benchmark-sol/blob/master/profiler/st >>orm%20core%20-%20sol%20-%200%20acker/storm%20core%20-%20call%20tree.png >> Method stats: >> >>https://github.com/roshannaik/storm-benchmark-sol/blob/master/profiler/st >>orm%20core%20-%20sol%20-%200%20acker/storm%20core%20-%20method%20stats.pn >>g >> >> -roshan >> >> >> From: Roshan Naik >><ros...@hortonworks.com<mailto:ros...@hortonworks.com>> >> Reply-To: "dev@storm.apache.org<mailto:dev@storm.apache.org>" < >> dev@storm.apache.org<mailto:dev@storm.apache.org>> >> Date: Monday, February 1, 2016 at 6:38 PM >> To: "dev@storm.apache.org<mailto:dev@storm.apache.org>" < >> dev@storm.apache.org<mailto:dev@storm.apache.org>> >> Subject: Performance Profiling - Storm core >> >> Attached is a screenshots of the performance profile for Storm core >>using >> a Speed of Light topology. >> >> Topology info: >> - 1 bolt instance, 1 spout instance, 1 worker. >> - ACKer count = 0 >> - Spout generates precomputes a random list of tuples, then keeps >>emitting >> them endlessly >> - Bolt just remits the same tuple and lacks >> - localOrShuffleGrouping >> - Topology Code : >> >>https://github.com/roshannaik/storm-benchmark-sol/blob/master/src/main/ja >>va/storm/benchmark/benchmarks/SOL.java >> >> >> Observations: >> >> * Call tree info shows that a big part of the nextTuple() invocation >> is consumed in the Collector.emit() call. A major part of that goes in >> Reflection by the clojure code >> * Method Stats view shows that a lot of time is spent blocking on >>the >> disruptor queue >> >> >> > > >-- >Regards, >Abhishek Agarwal