Bhuvan Arumugam created AURORA-1084:
---------------------------------------
Summary: createJob fail to schedule job, if any one of hosts
defined in constraint is down
Key: AURORA-1084
URL: https://issues.apache.org/jira/browse/AURORA-1084
Project: Aurora
Issue Type: Bug
Components: Scheduler
Affects Versions: 0.7.0
Reporter: Bhuvan Arumugam
When we define a job with 3 hosts in constraint and if any one host is down,
aurora fail to schedule the job in other hosts. In below example, slave
{{h3.com}} is down. The other slaves {{h2.com}} and {{h1.com}} are UP. The job
is created and remain in PENDING state forever.
The aurora job is configured with {{hostname}} constraint, each host separated
by comma.
The job should be scheduled in one of hosts that are UP.
{code}
I0201 05:30:21.121 THREAD52178
org.apache.aurora.scheduler.thrift.aop.LoggingInterceptor.invoke:
createJob(JobConfiguration(key:JobKey(role:tilter, environment:staging25,
name:tilter-multiproc), owner:Identity(role:tilter, user:jenkins),
cronSchedule:null, cronCollisionPolicy:KILL_EXISTING,
taskConfig:TaskConfig(job:JobKey(role:tilter, environment:staging25,
name:tilter-multiproc), owner:Identity(role:tilter, user:jenkins),
environment:staging25, jobName:tilter-multiproc, isService:false, numCpus:1.0,
ramMb:128, diskMb:150, priority:0, maxTaskFailures:1, production:false,
constraints:[Constraint(name:hostname, constraint:<TaskConstraint
value:ValueConstraint(negated:false, values [h3.com, h2.com, h1.com])>)],
requestedPorts:[], taskLinks:{}, executorConfig:ExecutorConfig(name:BLANKED,
data:BLANKED), metadata:[]), instanceCount:1), null,
SessionKey(mechanism:UNAUTHENTICATED, data:50 D0 14 4C 71 0D 4C 80 80 4C 40))
I0201 05:30:21.121 THREAD52178
I0201 05:30:21.122 THREAD52178
org.apache.aurora.scheduler.thrift.SchedulerThriftInterface$2.apply: Launching
1 tasks.
I0201 05:30:21.124 THREAD52178
com.twitter.common.util.StateMachine$Builder$1.execute:
1422768621123-tilter-staging25-tilter-multiproc-0-abc7cb29-dd79-4f78-9e8c-051986aab494
state machine transition INIT -> PENDING
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)