[
https://issues.apache.org/jira/browse/AURORA-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maxim Khutornenko updated AURORA-208:
-------------------------------------
Description:
sla_list_safe_domain
Usage: sla_list_safe_domain --cluster=cluster --attribute={rack | host}
percentage duration [--override_jobs=filename]
[--exclude_attr=filename][--list_jobs]
Returns a list of racks or hosts where it would be safe to kill tasks without
violating their job SLA: percentage of tasks that stayed up within the last
“duration” secs|mins|hrs|days. The SLA can be specified globally per cluster as
pair of percentage and duration values or per job in a file.
--cluster:
Aurora cluster name.
--attribute:
Currently supported attributes “host” or “rack”.
percentage:
Percentage of tasks required to be up within the duration. Applied to all jobs
except those listed in --override file.
duration:
Time interval (now - value) for the percentage of up tasks. Applied to all jobs
except those listed in --override_jobs file. Format:
<value>{secs|mins|hrs|days}.
--override_jobs:
An optional file to load job specific SLAs that will override cluster-wide
command line percentage and duration values. The file can have multiple lines
in the following format:
role/env/job percentage duration
--exclude_attr:
An optional text file listing attribute values (one per line) to exclude from
the result set if found.
--list_jobs:
Lists all affected job keys with projected new SLAs if their tasks get killed.
Examples:
sla_list_safe_domain --cluster=smf1 --attribute=host 85 10mins
sla_list_safe_domain --cluster=smf1 --filename=~/rack.txt
Example (--attribute=rack):
aurora_admin list_safe_sla_domain --cluster=cluster --attribute=rack 95 2hrs
rack1 host1
rack2 host2
rack2 host3
rack3 host1
Example output (--attribute=host):
aurora_admin list_safe_sla_domain --cluster=cluster --attribute=host 95 2hrs
host1
host2
host3
host4
Example output (--list_jobs):
aurora_admin list_safe_sla_domain --cluster=cluster --attribute=rack 95 2hrs
--list_jobs
rack1 host1 cluster/role/prod/job1 96.00 2hrs
rack2 host2 cluster/role/prod/job2 97.65 2hrs
rack2 host2 cluster/role/prod/job3 95.05 2hrs
was:
sla_list_safe_domain
Usage: sla_list_safe_domain --cluster=cluster --attribute={rack | host}
percentage duration [--override_jobs=filename]
[--exclude_attr=filename][--list_jobs]
Returns a list of racks or hosts where it would be safe to kill tasks without
violating their job SLA: percentage of tasks that stayed up within the last
“duration” secs|mins|hrs|days. The SLA can be specified globally per cluster as
pair of percentage and duration values or per job in a file.
--cluster:
Aurora cluster name.
--attribute:
Currently supported attributes “host” or “rack”.
percentage:
Percentage of tasks required to be up within the duration. Applied to all jobs
except those listed in --override file.
duration:
Time interval (now - value) for the percentage of up tasks. Applied to all jobs
except those listed in --override_jobs file. Format:
<value>{secs|mins|hrs|days}.
--override_jobs:
An optional file to load job specific SLAs that will override cluster-wide
command line percentage and duration values. The file can have multiple lines
in the following format:
role/env/job percentage duration
--exclude_attr:
An optional text file listing attribute values (one per line) to exclude from
the result set if found.
--list_jobs:
Lists all affected job keys with projected new SLAs if their tasks get killed.
Examples:
sla_list_safe_domain --cluster=smf1 --attribute=host 85 10mins
sla_list_safe_domain --cluster=smf1 --filename=~/rack.txt
Example (--attribute=rack):
aurora_admin list_safe_sla_domain --cluster=smf1 --attribute=rack 95 2hrs
aau smf1-aau-15-sr3.prod.twitter.com
aau smf1-aau-29-sr2.prod.twitter.com
aau smf1-aau-30-sr3.prod.twitter.com
aev smf1-aev-02-sr2.prod.twitter.com
aev smf1-aev-11-sr2.prod.twitter.com
cnm smf1-cnm-26-sr3.prod.twitter.com
cnm smf1-cnm-27-sr3.prod.twitter.com
cnm smf1-cnm-28-sr3.prod.twitter.com
Example output (--attribute=host):
aurora_admin list_safe_sla_domain --cluster=smf1 --attribute=host 95 2hrs
smf1-ayz-29-sr3.prod.twitter.com
smf1-cgk-11-sr2.prod.twitter.com
smf1-cga-22-sr2.prod.twitter.com
smf1-cnk-03-sr3.prod.twitter.com
Example output (--list_jobs):
aurora_admin list_safe_sla_domain --cluster=smf1 --attribute=rack 95 2hrs
--list_jobs
ayz smf1-ayz-29-sr3.prod.twitter.com mesos/prod/labrat 96.00 2hrs
ayz smf1-ayz-29-sr3.prod.twitter.com mesos/prod/caliper 97.65 2hrs
ayz smf1-ayz-29-sr3.prod.twitter.com mesos/prod/packer 95.05 2hrs
> Add sla_list_safe_domain command into aurora_admin client
> ---------------------------------------------------------
>
> Key: AURORA-208
> URL: https://issues.apache.org/jira/browse/AURORA-208
> Project: Aurora
> Issue Type: Task
> Components: Client
> Reporter: Maxim Khutornenko
> Assignee: Maxim Khutornenko
>
> sla_list_safe_domain
> Usage: sla_list_safe_domain --cluster=cluster --attribute={rack | host}
> percentage duration [--override_jobs=filename]
> [--exclude_attr=filename][--list_jobs]
> Returns a list of racks or hosts where it would be safe to kill tasks without
> violating their job SLA: percentage of tasks that stayed up within the last
> “duration” secs|mins|hrs|days. The SLA can be specified globally per cluster
> as pair of percentage and duration values or per job in a file.
> --cluster:
> Aurora cluster name.
> --attribute:
> Currently supported attributes “host” or “rack”.
> percentage:
> Percentage of tasks required to be up within the duration. Applied to all
> jobs except those listed in --override file.
> duration:
> Time interval (now - value) for the percentage of up tasks. Applied to all
> jobs except those listed in --override_jobs file. Format:
> <value>{secs|mins|hrs|days}.
> --override_jobs:
> An optional file to load job specific SLAs that will override cluster-wide
> command line percentage and duration values. The file can have multiple lines
> in the following format:
> role/env/job percentage duration
> --exclude_attr:
> An optional text file listing attribute values (one per line) to exclude from
> the result set if found.
> --list_jobs:
> Lists all affected job keys with projected new SLAs if their tasks get killed.
> Examples:
> sla_list_safe_domain --cluster=smf1 --attribute=host 85 10mins
> sla_list_safe_domain --cluster=smf1 --filename=~/rack.txt
> Example (--attribute=rack):
> aurora_admin list_safe_sla_domain --cluster=cluster --attribute=rack 95 2hrs
> rack1 host1
> rack2 host2
> rack2 host3
> rack3 host1
> Example output (--attribute=host):
> aurora_admin list_safe_sla_domain --cluster=cluster --attribute=host 95 2hrs
> host1
> host2
> host3
> host4
> Example output (--list_jobs):
> aurora_admin list_safe_sla_domain --cluster=cluster --attribute=rack 95 2hrs
> --list_jobs
> rack1 host1 cluster/role/prod/job1 96.00 2hrs
> rack2 host2 cluster/role/prod/job2 97.65 2hrs
> rack2 host2 cluster/role/prod/job3 95.05 2hrs
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)