On Wednesday 15 July 2015 06:17 PM, Peter Zijlstra wrote: > On Wed, Jul 15, 2015 at 02:38:36PM +0530, Aravinda Prasad wrote: >> Current tracing infrastructure such as perf and ftrace reports system >> wide data when invoked inside a container. It is required to restrict >> events specific to a container context when such tools are invoked >> inside a container. >> >> This RFC patch supports filtering container specific events, without >> any change in the user interface, when invoked within a container for >> the perf utility; such support needs to be extended to ftrace. This >> patch assumes that the debugfs is available within the container and >> all the processes running inside a container are grouped into a single >> perf_event subsystem of cgroups. This patch piggybacks on the existing >> support available for tracing with cgroups [1] by setting the cgrp >> member of the event structure to the cgroup of the context perf tool >> is invoked from. >> >> However, this patch is not complete and requires more work to fully >> support tracing inside a container. This patch is intended to initiate >> the discussion on having container-aware tracing support. A detailed >> explanation on what is supported and pending issues are mentioned >> below. > > tracing is outside the scope of perf; I suspect you want tracefs to be > sensitive to filesystem namespaces and all that that entails.
Yes, tracefs needs to be sensitive to filesystem namespace. I wanted to put together points required for supporting perf/trace inside containers. > >> Cc: Hari Bathini <[email protected]> >> Signed-off-by: Aravinda Prasad <[email protected]> >> --- >> kernel/events/core.c | 49 >> +++++++++++++++++++++++++++++++++++-------------- >> 1 file changed, 35 insertions(+), 14 deletions(-) >> >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index 81aa3a4..f6a1f89 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -589,17 +589,38 @@ static inline int perf_cgroup_connect(int fd, struct >> perf_event *event, >> { >> struct perf_cgroup *cgrp; >> struct cgroup_subsys_state *css; >> - struct fd f = fdget(fd); >> + struct fd f; >> int ret = 0; >> >> - if (!f.file) >> - return -EBADF; >> + if (fd != -1) { >> + f = fdget(fd); >> + if (!f.file) >> + return -EBADF; >> >> - css = css_tryget_online_from_dir(f.file->f_path.dentry, >> + css = css_tryget_online_from_dir(f.file->f_path.dentry, >> &perf_event_cgrp_subsys); >> - if (IS_ERR(css)) { >> - ret = PTR_ERR(css); >> - goto out; >> + if (IS_ERR(css)) { >> + ret = PTR_ERR(css); >> + fdput(f); >> + return ret; >> + } >> + } else if (event->attach_state == PERF_ATTACH_TASK) { >> + /* Tracing on a PID. No need to set event->cgrp */ >> + return ret; >> + } else if (task_active_pid_ns(current) != &init_pid_ns) { > > Why the pid namespace? This comes from my understanding of container -- having at least a separate PID namespace with processes inside a container grouped into a single perf_event cgroups subsystem. I know there are other ways to define a container, however, I thought I start with the above one. > >> + /* Don't set event->cgrp if task belongs to root cgroup */ >> + if (task_css_is_root(current, perf_event_cgrp_id)) >> + return ret; > > So if you have the root perf_cgroup inside your container you can > escape? If we have root perf_cgroup inside the container then even if we set event->cgrp we will be including all processes in the system. Regards, Aravinda > >> + >> + css = task_css(current, perf_event_cgrp_id); >> + if (!css || !css_tryget_online(css)) >> + return -ENOENT; >> + } else { >> + /* >> + * perf invoked from global context and hence don't set >> + * event->cgrp as all the events should be included >> + */ >> + return ret; >> } >> >> cgrp = container_of(css, struct perf_cgroup, css); > -- Regards, Aravinda -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

