[
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964467#comment-14964467
]
Guangya Liu edited comment on MESOS-3747 at 10/20/15 3:46 AM:
--------------------------------------------------------------
The current behavior is that if the user was not specified or specified to a
nonexistent user and with swith_user as true (default value), then the task
will be failed and I think that the agent have enough information telling end
user what is wrong for such case as following. [~marco-mesos] comments? Thanks.
{code}
I1020 11:39:10.438297 9100 slave.cpp:1407] Launching task cluster-test for
framework da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000
W1020 11:39:10.519444 9100 paths.cpp:423] Failed to chown executor directory
'/tmp/mesos/slaves/3e0df733-08b3-4883-b3fa-92bdc0c05b2f-S0/frameworks/da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000/executors/cluster-test/runs/ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7':
Failed to get user information for 'abc': Success
I1020 11:39:10.519707 9100 slave.cpp:4994] Launching executor cluster-test of
framework da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000 with resources cpus(*):0.1;
mem(*):32 in work directory
'/tmp/mesos/slaves/3e0df733-08b3-4883-b3fa-92bdc0c05b2f-S0/frameworks/da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000/executors/cluster-test/runs/ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7'
I1020 11:39:10.520228 9096 containerizer.cpp:639] Starting container
'ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7' for executor 'cluster-test' of framework
'da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000'
I1020 11:39:10.520397 9100 slave.cpp:1625] Queuing task 'cluster-test' for
executor cluster-test of framework 'da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000
I1020 11:39:10.520524 9100 slave.cpp:679] Successfully attached file
'/tmp/mesos/slaves/3e0df733-08b3-4883-b3fa-92bdc0c05b2f-S0/frameworks/da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000/executors/cluster-test/runs/ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7'
I1020 11:39:10.526057 9096 linux_launcher.cpp:352] Cloning child process with
flags =
I1020 11:39:10.629271 9100 containerizer.cpp:1278] Executor for container
'ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7' has exited
I1020 11:39:10.629390 9100 containerizer.cpp:1095] Destroying container
'ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7'
{code}
was (Author: gyliu):
The current behavior is that if the user was not specified or specified to a
nonexistent user, then the task will be failed and I think that the agent have
enough information telling end user what is wrong for such case as following.
[~marco-mesos] comments? Thanks.
{code}
I1020 11:39:10.438297 9100 slave.cpp:1407] Launching task cluster-test for
framework da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000
W1020 11:39:10.519444 9100 paths.cpp:423] Failed to chown executor directory
'/tmp/mesos/slaves/3e0df733-08b3-4883-b3fa-92bdc0c05b2f-S0/frameworks/da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000/executors/cluster-test/runs/ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7':
Failed to get user information for 'abc': Success
I1020 11:39:10.519707 9100 slave.cpp:4994] Launching executor cluster-test of
framework da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000 with resources cpus(*):0.1;
mem(*):32 in work directory
'/tmp/mesos/slaves/3e0df733-08b3-4883-b3fa-92bdc0c05b2f-S0/frameworks/da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000/executors/cluster-test/runs/ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7'
I1020 11:39:10.520228 9096 containerizer.cpp:639] Starting container
'ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7' for executor 'cluster-test' of framework
'da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000'
I1020 11:39:10.520397 9100 slave.cpp:1625] Queuing task 'cluster-test' for
executor cluster-test of framework 'da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000
I1020 11:39:10.520524 9100 slave.cpp:679] Successfully attached file
'/tmp/mesos/slaves/3e0df733-08b3-4883-b3fa-92bdc0c05b2f-S0/frameworks/da7c02ac-ddc3-4e0c-ab87-cfcc52955e95-0000/executors/cluster-test/runs/ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7'
I1020 11:39:10.526057 9096 linux_launcher.cpp:352] Cloning child process with
flags =
I1020 11:39:10.629271 9100 containerizer.cpp:1278] Executor for container
'ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7' has exited
I1020 11:39:10.629390 9100 containerizer.cpp:1095] Destroying container
'ff0dbf31-20e1-4e9e-9e0a-1b69ac273ec7'
{code}
> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -------------------------------------------------------------------------
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
> Issue Type: Bug
> Components: HTTP API
> Affects Versions: 0.24.0, 0.24.1, 0.25.0
> Reporter: Ben Whitehead
> Assignee: Liqiang Lin
> Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to
> inherit the user the agent processes is running as, this behavior now results
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-0000" },
> type: ACCEPT,
> accept {
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations {
> type: LAUNCH,
> launch {
> task_infos [
> {
> name: "task-1",
> task_id: { value: "task-1" },
> agent_id: { value:
> "20151015-125949-16777343-5050-20146-S0" },
> resources [
> { name: "cpus", type: SCALAR, scalar: { value:
> 0.1 }, role: "*" },
> { name: "mem", type: SCALAR, scalar: { value:
> 64.0 }, role: "*" },
> { name: "disk", type: SCALAR, scalar: { value:
> 0.0 }, role: "*" },
> ],
> command: {
> environment {
> variables [
> { name: "SLEEP_SECONDS" value: "15" }
> ]
> },
> value: "env | sort && sleep $SLEEP_SECONDS"
> }
> }
> ]
> }
> }
> }
> }
> {code}
> {code:title=Update Status}
> event: {
> type: UPDATE,
> update: {
> status: {
> task_id: { value: "task-1" },
> state: TASK_FAILED,
> message: "Container destroyed while preparing isolators",
> agent_id: { value: "20151015-125949-16777343-5050-20146-S0" },
> timestamp: 1.444939217401241E9,
> executor_id: { value: "task-1" },
> source: SOURCE_AGENT,
> reason: REASON_MEMORY_LIMIT,
> uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|"
> }
> }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
> Failed to get user information for '': Success
> I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 with resources
> cpus(*):0.1; mem(*):32 in work directory
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b'
> I1015 13:15:34.262581 19639 slave.cpp:1604] Queuing task 'task-1' for
> executor task-1 of framework 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:34.262684 19638 docker.cpp:734] No container info found, skipping
> launch
> I1015 13:15:34.263478 19638 containerizer.cpp:640] Starting container
> '3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework
> 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000'
> E1015 13:15:34.264516 19641 slave.cpp:3342] Container
> '3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework
> 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000' failed to start: Failed to
> prepare isolator: Failed to get user information for '': Success
> I1015 13:15:34.264681 19636 containerizer.cpp:1097] Destroying container
> '3958ff84-8dd9-4c3c-995d-5aba5250541b'
> I1015 13:15:34.265997 19636 slave.cpp:3433] Executor 'task-1' of framework
> e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 has terminated with unknown status
> I1015 13:15:34.266568 19636 slave.cpp:2717] Handling status update
> TASK_FAILED (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task task-1 of
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 from @0.0.0.0:0
> W1015 13:15:34.266695 19636 containerizer.cpp:988] Ignoring update for
> unknown container: 3958ff84-8dd9-4c3c-995d-5aba5250541b
> I1015 13:15:34.266772 19638 status_update_manager.cpp:322] Received status
> update TASK_FAILED (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task
> task-1 of framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:34.266885 19636 slave.cpp:3016] Forwarding the update TASK_FAILED
> (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task task-1 of framework
> e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 to [email protected]:5050
> I1015 13:15:35.255997 19638 status_update_manager.cpp:394] Received status
> update acknowledgement (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task
> task-1 of framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:35.256165 19640 slave.cpp:3544] Cleaning up executor 'task-1' of
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:35.256273 19641 gc.cpp:56] Scheduling
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b'
> for gc 6.99999703411852days in the future
> I1015 13:15:35.256283 19640 slave.cpp:3633] Cleaning up framework
> e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:35.256340 19641 gc.cpp:56] Scheduling
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1'
> for gc 6.99999703386667days in the future
> I1015 13:15:35.256350 19634 status_update_manager.cpp:284] Closing status
> update streams for framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:35.256377 19641 gc.cpp:56] Scheduling
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000'
> for gc 6.99999703291556days in the future
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)