[ 
https://issues.apache.org/jira/browse/AURORA-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rogier Dikkes updated AURORA-1811:
----------------------------------
    Description: 
We recently had to patch hosts, in our situation we have a couple of services 
that run less than 2-5 instances with production = true and tier = preferred as 
provided in the default example documentation. 

As we understood host_drain is not configurable to set the minimum job instance 
count, the default is 10. We tried to compile a list of hosts with aurora_admin 
sla_list_safe_domain that are running these services to feed host_drain with an 
unsafe_hosts_file. 

When we ran the aurora_admin sla_list_safe_domain --min_job_instance_count=2 
devcluster 95 1m the scheduler returns: 
 INFO] Response from scheduler: OK (message: )

As if there are no hosts. We tried to change the percentage and duration to see 
if anything was returned but we never receive an different response.

To ensure that the client is not the cause we used the 0.16.0 client against an 
0.14.0 cluster, this cluster reports hosts that are safe to kill without 
violating job sla's. 

To ensure its not a faulty cluster setup on our part we started the vagrant 
sandbox, started an task with 3 instances with tier = preferred and production 
= True.

commands used:
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 20 50m
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 90 5m

With -l or with time and percentage variations never changes the outcome.

Changing the instance_count to a higher number does not change output either.

  was:
We recently had to patch hosts, in our situation we have a couple of services 
that run less than 2-5 instances with production = true and tier = preferred as 
provided in the default example documentation. 

As we understood host_drain is not configurable to set the minimum job instance 
count, the default is is 10. We tried to compile a list of hosts with 
aurora_admin sla_list_safe_domain that are running these services to feed 
host_drain with an unsafe_hosts_file. 

When we ran the aurora_admin sla_list_safe_domain --min_job_instance_count=2 
devcluster 95 1m the scheduler returns: 
 INFO] Response from scheduler: OK (message: )

As if there are no hosts. We tried to change the percentage and duration to see 
if anything was returned but we never receive an different response.

To ensure that the client is not the cause we used the 0.16.0 client against an 
0.14.0 cluster, this cluster reports hosts that are safe to kill without 
violating job sla's. 

To ensure its not a faulty cluster setup on our part we started the vagrant 
sandbox, started an task with 3 instances with tier = preferred and production 
= True.

commands used:
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 20 50m
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 90 5m

With -l or with time and percentage variations never changes the outcome.



> sla_list_safe_domain no longer reports SLA usage
> ------------------------------------------------
>
>                 Key: AURORA-1811
>                 URL: https://issues.apache.org/jira/browse/AURORA-1811
>             Project: Aurora
>          Issue Type: Bug
>          Components: Client, Maintenance, SLA
>    Affects Versions: 0.16.0
>         Environment: Vagrant image - Ubuntu, Centos 7.2
>            Reporter: Rogier Dikkes
>            Priority: Minor
>              Labels: client, features, sla
>
> We recently had to patch hosts, in our situation we have a couple of services 
> that run less than 2-5 instances with production = true and tier = preferred 
> as provided in the default example documentation. 
> As we understood host_drain is not configurable to set the minimum job 
> instance count, the default is 10. We tried to compile a list of hosts with 
> aurora_admin sla_list_safe_domain that are running these services to feed 
> host_drain with an unsafe_hosts_file. 
> When we ran the aurora_admin sla_list_safe_domain --min_job_instance_count=2 
> devcluster 95 1m the scheduler returns: 
>  INFO] Response from scheduler: OK (message: )
> As if there are no hosts. We tried to change the percentage and duration to 
> see if anything was returned but we never receive an different response.
> To ensure that the client is not the cause we used the 0.16.0 client against 
> an 0.14.0 cluster, this cluster reports hosts that are safe to kill without 
> violating job sla's. 
> To ensure its not a faulty cluster setup on our part we started the vagrant 
> sandbox, started an task with 3 instances with tier = preferred and 
> production = True.
> commands used:
> aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 20 50m
> aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 90 5m
> With -l or with time and percentage variations never changes the outcome.
> Changing the instance_count to a higher number does not change output either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to