Implementing the resource aware scheduler would be decidedly non-trivial. Every 
topology will need additional configuration to tune for things like memory 
sizes, which is not going to buy you much change. So, at the micro-tuning level 
of parser this doesn’t make a lot of sense. 

However, it may be relevant to consider separate tuning for parsers in general 
vs the core enrichment and indexing topologies (potentially also for separate 
indexing topologies when this comes in) and the resource scheduler could 
provide a theoretical benefit there.

Specifying resource requirements per parser topology might sound like a good 
idea, but if your parsers are working the way they should, they should be using 
a small amount of memory as their default size, and achieving additional 
resource use by multiplying workers and executors (to get higher usage per 
slot) and balance the load that way. To be honest, the only difference you’re 
going to get from the RAS is to add a bunch of tuning parameters which allow 
slightly different granularity of units for things like memory.

The other RAS feature which might be a good add is prioritisation of different 
parser topologies, but again, this is probably not something you want to push 
hard on unless you are severely limited in resources (in which case, why not 
just add another node, it will be cheaper than spending all that time 
micro-tuning the resource requirements for each data feed).

Right now we do allow a lot of micro tuning of parallelism around things like 
the count of executor threads, which is achieves roughly the equivalent of the 
cpu based limits in the RAS. 

TL;DR: 

If you’re not using resource pools for different users and using the idea that 
prioritisation can lead to arbitrary kills, all you’re getting is a slightly 
different way of tuning knobs that already exist, but you would get a slightly 
different granularity. Also, we would have to rewrite all the topology code to 
add the config endpoints for CPU and memory estimates. 

Simon

> On 24 Nov 2017, at 07:56, Ali Nazemian <alinazem...@gmail.com> wrote:
> 
> Any help regarding this question would be appreciated.
> 
> 
> On Thu, Nov 23, 2017 at 8:57 AM, Ali Nazemian <alinazem...@gmail.com> wrote:
> 
>> 30 mins average of CPU load by checking Ambari.
>> 
>> On 23 Nov. 2017 00:51, "Otto Fowler" <ottobackwa...@gmail.com> wrote:
>> 
>> How are you measuring the utilization?
>> 
>> 
>> On November 22, 2017 at 08:12:51, Ali Nazemian (alinazem...@gmail.com)
>> wrote:
>> 
>> Hi all,
>> 
>> 
>> One of the issues that we are dealing with is the fact that not all of
>> the Metron feeds have the same type of resource requirements. For example,
>> we have some feeds that even a single Strom slot is way more than what it
>> needs. We thought we could make it more utilised in total by limiting at
>> least the amount of available heap space per feed to the parser topology
>> worker. However, since Storm scheduler relies on available slots, it is
>> very hard and almost impossible to utilise the cluster in the scenario
>> that
>> there will be lots of different topologies with different requirements
>> running at the same time. Therefore, on a daily basis, we can see that for
>> example one of the Storm hosts is 120% utilised and another is 20%
>> utilised! I was wondering whether we can address this situation by using
>> Storm Resource Aware scheduler or not.
>> 
>> P.S: it would be very nice to have a functionality to tune Storm
>> topology-related parameters per feed in the GUI (for example in Management
>> UI).
>> 
>> 
>> Regards,
>> Ali
>> 
>> 
>> 
> 
> 
> -- 
> A.Nazemian

Reply via email to