Re: Sending periodic statistics to Spout from Bolts

Bobby Evans Mon, 13 Feb 2017 14:49:24 -0800

First off if you don't know clojure you are in luck. On the master branch all 
of the core code except for the UI, shell submission and a few classes needed 
to support them are in java.  There are still several tests that also need to 
move over to java but it should not be too big of an issue for you.

q-length is fairly straight forward to collect.CPU utilization is hard Java 
does expose this through JMX on a per thread basis, so you might be able to get 
this for the executor thread of a given bolt/spout.memory utilization is on a 
per worker basis, but is fairly simple to get through JMXservice time vs idle 
time are things you will probably need to write yourself, but are probably not 
too difficult to do.  Be careful though this is on the data path and can impact 
the performance of all topologies.

"on-need basis" is the hard part.  This is because the downstream components 
need to be able to know that an upstream component needs specific metrics.  I 
think the best way would be to broadcast it at a low frequency, but have 
thresholds where it would send it again if something changed drastically.

- Bobby

On Monday, February 13, 2017, 4:25:50 PM CST, Anis Nasir <[email protected]> 
wrote:Dear Bobby,

In this case, how can we enable such configuration?

I am not very familiar with clojure. However, I would like the downstream
operators to report various parameters on-need basis to the upstream
operators, like service time, queue length, CPU utilization, memory
utilization, idle time, etc.

Regards,
Anis

On Tue, Feb 14, 2017 at 12:36 AM, Bobby Evans <[email protected]>
wrote:

> Yes makes perfect since.
>
>
> - Bobby
>
> On Friday, February 10, 2017, 4:36:22 PM CST, Anis Nasir <
> [email protected]> wrote:Dear Bobby,
>
> Thank you very much for your reply.
>
> In real deployments, it is often the case that executors are heterogenous
> and execution time per tuple is non-uniform (as discussed in the JIRA). In
> such cases, the workload and capacity (of executors) distributions are
> often unknown at the upstream operator and it is required to infer the
> capacity of each worker and the assigned workload.
>
> For such scenarios, I would like to design a grouping scheme that allows
> upstream operators to change the assignments by knowing both the workload
> and the capacities of the machine.
>
> Also, i would prefer that each downstream operator can send this message
> on-need basis, rather than broadcasting it across the whole set of
> operators.
>
> Does it makes sense?
>
> Regards,
> Anis
>
>
>
>
>
>
>
>
> On Fri, Feb 10, 2017 at 11:54 PM, Bobby Evans <[email protected]
> >
> wrote:
>
> > Anis,
> > We already have the q-length being reported up stream.
> > https://issues.apache.org/jira/browse/STORM-162
> > It works well, except when a topology gets really big the amount of
> > metrics being collected can negatively impact the performance of the
> > topology.  By really big I mean several thousand workers.
> > There has also been a push to redo the metrics system in storm so it is
> > more scalable and so that nimbus can query it.  That is what I personally
> > think would be a good long term solution for features like elasticity.
> But
> > I am not really sure what you mean by load aware scheduling.
> >
> > - Bobby
> >
> > On Thursday, February 9, 2017, 10:34:29 PM CST, Anis Nasir <
> > [email protected]> wrote:Dear All,
> >
> > I have been trying to implement load aware scheduling for Apache Storm.
> >
> > For this purpose, I need to send periodic statistics from downstream
> > operators to upstream operators.
> >
> > Is there a standard way of sending such statistics to upstream operator,
> > e.g., a bolt periodically reporting it's local queue length to the
> upstream
> > spout.
> >
> > Thanking you in advance.
> >
> > Regards,
> > Anis
> >
>

Re: Sending periodic statistics to Spout from Bolts

Reply via email to