[jira] [Commented] (AURORA-1084) createJob fail to schedule job, if any one of hosts defined in constraint is down

Bill Farner (JIRA) Fri, 06 Feb 2015 15:54:06 -0800

    [ 
https://issues.apache.org/jira/browse/AURORA-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310225#comment-14310225
 ]


Bill Farner commented on AURORA-1084:
-------------------------------------

It does.  I realize now that i gave misinformation about how the multi-valued 
constraints work - they are ORed (i previously said they were ANDed).

So, if a task is configured with:
  constraints={'hostname': 'h3.com,h2.com,h1.com'}

It will run on any of those three hosts.  It's not obvious to me how the 
absence of one of those hosts would cause the task to not schedule (you could 
technically have the below and it should behave identically:
  constraints={'hostname': 'h3.com,h2.com,h1.com,SOMENONEXISTENTHOST'}

You can collect some debugging info here by turning up the log level for 
TaskAssignerImpl to FINE (this can be done on the fly in the /logconfig 
endpoint.  The resulting log lines will indicate why specific host/task pairs 
were rejected.

> createJob fail to schedule job, if any one of hosts defined in constraint is 
> down
> ---------------------------------------------------------------------------------
>
>                 Key: AURORA-1084
>                 URL: https://issues.apache.org/jira/browse/AURORA-1084
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 0.7.0
>            Reporter: Bhuvan Arumugam
>
> When we define a job with 3 hosts in constraint and if any one host is down, 
> aurora fail to schedule the job in other hosts. In below example, slave 
> {{h3.com}} is down. The other slaves {{h2.com}} and {{h1.com}} are UP. The 
> job is created and remain in PENDING state forever.
> The aurora job is configured with {{hostname}} constraint, each host 
> separated by comma.
> The job should be scheduled in one of hosts that are UP.
> {code}
> I0201 05:30:21.121 THREAD52178 
> org.apache.aurora.scheduler.thrift.aop.LoggingInterceptor.invoke: 
> createJob(JobConfiguration(key:JobKey(role:tilter, environment:staging25, 
> name:tilter-multiproc), owner:Identity(role:tilter, user:jenkins), 
> cronSchedule:null, cronCollisionPolicy:KILL_EXISTING, 
> taskConfig:TaskConfig(job:JobKey(role:tilter, environment:staging25, 
> name:tilter-multiproc), owner:Identity(role:tilter, user:jenkins), 
> environment:staging25, jobName:tilter-multiproc, isService:false, 
> numCpus:1.0, ramMb:128, diskMb:150, priority:0, maxTaskFailures:1, 
> production:false, constraints:[Constraint(name:hostname, 
> constraint:<TaskConstraint value:ValueConstraint(negated:false, values 
> [h3.com, h2.com, h1.com])>)], requestedPorts:[], taskLinks:{}, 
> executorConfig:ExecutorConfig(name:BLANKED, data:BLANKED), metadata:[]), 
> instanceCount:1), null, SessionKey(mechanism:UNAUTHENTICATED, data:50 D0 14 
> 4C 71 0D 4C 80 80 4C 40))
> I0201 05:30:21.121 THREAD52178 
> I0201 05:30:21.122 THREAD52178 
> org.apache.aurora.scheduler.thrift.SchedulerThriftInterface$2.apply: 
> Launching 1 tasks.
> I0201 05:30:21.124 THREAD52178 
> com.twitter.common.util.StateMachine$Builder$1.execute: 
> 1422768621123-tilter-staging25-tilter-multiproc-0-abc7cb29-dd79-4f78-9e8c-051986aab494
>  state machine transition INIT -> PENDING
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AURORA-1084) createJob fail to schedule job, if any one of hosts defined in constraint is down

Reply via email to