[
https://issues.apache.org/jira/browse/MESOS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813203#comment-15813203
]
Zhitao Li commented on MESOS-6596:
----------------------------------
[~mcypark] Thanks for replying.
1) We are currently using a 10secs {{allocation_interval}} right now. We
recently increased this from the default 1secs when we see the allocator
libprocess queue backup (which is another I need to report and possibly seeking
for advice);
2) We have about ~50 frameworks in play which frequently sees offers from these
agents, rapidly declines offers on them with a filter timeout between 60-120s.
The distribution is still being changed as we gradually upgrade these
frameworks.
I'm interested in your suggestion of the "request leaving room". Do you think
this call needs to be 1) aware of the role we are reserving the resource of the
agent, and 2) associated with some kind of timeout, in case the
{{updateAvailable}} call is not delivered in time?
> Dynamic reservation endpoint returns 409s
> -----------------------------------------
>
> Key: MESOS-6596
> URL: https://issues.apache.org/jira/browse/MESOS-6596
> Project: Mesos
> Issue Type: Bug
> Components: master
> Reporter: Kunal Thakar
>
> The operation to dynamically reserve a host for a framework consistently
> fails, but succeeds sometimes.
> We are calling the /reserve endpoint on the master with the same payload and
> it mostly returns 409, with the occasional success. Pasting the output of two
> consecutive /reserve calls:
> {code}
> * About to connect() to computexxx-yyy port 5050 (#0)
> * Trying 10.184.21.3... connected
> * Server auth using Basic with user 'cassandra'
> > POST /master/reserve HTTP/1.1
> > Authorization: Basic blah
> > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j
> > zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> > Host: computexxx-yyy:5050
> > Accept: */*
> > Content-Length: 1046
> > Content-Type: application/x-www-form-urlencoded
> > Expect: 100-continue
> >
> * Done waiting for 100-continue
> < HTTP/1.1 409 Conflict
> HTTP/1.1 409 Conflict
> < Date: Tue, 15 Nov 2016 23:07:10 GMT
> Date: Tue, 15 Nov 2016 23:07:10 GMT
> < Content-Type: text/plain; charset=utf-8
> Content-Type: text/plain; charset=utf-8
> < Content-Length: 58
> Content-Length: 58
> * HTTP error before end of send, stop sending
> <
> * Closing connection #0
> Invalid RESERVE Operation: does not contain mem(*):120621
> {code}
> {code}
> * About to connect() to computexxx-yyy port 5050 (#0)
> * Trying 10.184.21.3... connected
> * Server auth using Basic with user 'cassandra'
> > POST /master/reserve HTTP/1.1
> > Authorization: Basic blah
> > User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.2j
> > zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> > Host: computexxx-yyy:5050
> > Accept: */*
> > Content-Length: 1046
> > Content-Type: application/x-www-form-urlencoded
> > Expect: 100-continue
> >
> * Done waiting for 100-continue
> < HTTP/1.1 202 Accepted
> HTTP/1.1 202 Accepted
> < Date: Tue, 15 Nov 2016 23:07:16 GMT
> Date: Tue, 15 Nov 2016 23:07:16 GMT
> < Content-Length: 0
> Content-Length: 0
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)