[ 
https://issues.apache.org/jira/browse/STORM-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060861#comment-15060861
 ] 

ASF GitHub Bot commented on STORM-898:
--------------------------------------

Github user knusbaum commented on a diff in the pull request:

    https://github.com/apache/storm/pull/921#discussion_r47833630
  
    --- Diff: storm-core/src/jvm/backtype/storm/scheduler/TopologyDetails.java 
---
    @@ -396,34 +410,50 @@ public void addResourcesForExec(ExecutorDetails exec, 
Map<String, Double> resour
                 LOG.warn("Executor {} already exists...ResourceList: {}", 
exec, getTaskResourceReqList(exec));
                 return;
             }
    -        _resourceList.put(exec, resourceList);
    +        this.resourceList.put(exec, resourceList);
         }
     
         /**
          * Add default resource requirements for a executor
          */
         public void addDefaultResforExec(ExecutorDetails exec) {
    +        Double topologyComponentCpuPcorePercent = 
Utils.getDouble(this.topologyConf.get(Config.TOPOLOGY_COMPONENT_CPU_PCORE_PERCENT),
 null);
    +        if (topologyComponentCpuPcorePercent == null) {
    +            LOG.warn("default value for " + 
Config.TOPOLOGY_COMPONENT_CPU_PCORE_PERCENT + " needs to be set!");
    +        }
    --- End diff --
    
    This will **only** occur if a developer has deleted an existing value in 
defaults.yaml - that is, they are working on Storm itself, not building a 
topology. The topology config validation will take care of that - There is no 
way a user can submit a topology that will cause this on a properly compiled 
Storm cluster. 
    
    This is determined at compile-time, and will be caught by unit tests. 
Having it caught in unit tests is preferable since it will raise an alert on 
any pull requests that break it, whereas having it here won't raise an alert 
until a cluster is up and someone tries to launch something.
    
    I can still see the usefulness of these checks IF we throw 
RuntimeExceptions here for two main reasons:
    
    1. It might blow up later, and when I look at the logs and see a stack 
trace, I don't go back too far to look for other stuff. So if we don't throw 
here, the users are going to be dealing with the same NPE in a weird spot you 
were trying to avoid.
    
    2. It might *not* blow up later, and instead just exhibit some weird 
behavior. This will be worse because there won't be a stack trace for the user 
to find, and they won't find any errors in the logs. (These are `LOG.warn()` 
calls)


> Add priorities and per user resource guarantees to Resource Aware Scheduler
> ---------------------------------------------------------------------------
>
>                 Key: STORM-898
>                 URL: https://issues.apache.org/jira/browse/STORM-898
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: Robert Joseph Evans
>            Assignee: Boyang Jerry Peng
>         Attachments: Resource Aware Scheduler for Storm.pdf
>
>
> In a multi-tenant environment we would like to be able to give individual 
> users a guarantee of how much CPU/Memory/Network they will be able to use in 
> a cluster.  We would also like to know which topologies a user feels are the 
> most important to keep running if there are not enough resources to run all 
> of their topologies.
> Each user should be able to specify if their topology is production, staging, 
> or development. Within each of those categories a user should be able to give 
> a topology a priority, 0 to 10 with 10 being the highest priority (or 
> something like this).
> If there are not enough resources on a cluster to run a topology assume this 
> topology is running using resources and find the user that is most over their 
> guaranteed resources.  Shoot the lowest priority topology for that user, and 
> repeat until, this topology is able to run, or this topology would be the one 
> shot.   Ideally we don't actually shoot anything until we know that we would 
> have made enough room.
> If the cluster is over-subscribed and everyone is under their guarantee, and 
> this topology would not put the user over their guarantee.  Shoot the lowest 
> priority topology in this workers resource pool until there is enough room to 
> run the topology or this topology is the one that would be shot.  We might 
> also want to think about what to do if we are going to shoot a production 
> topology in an oversubscribed case, and perhaps we can shoot a non-production 
> topology instead even if the other user is not over their guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to