On Sun, Sep 24, 2017 at 02:27:00PM -0700, Andy Lutomirski wrote:
> On Sun, Sep 24, 2017 at 1:08 PM, Alexey Dobriyan <adobri...@gmail.com> wrote:
> > From: Tatsiana Brouka <tatsiana_bro...@epam.com>
> >
> > Implement system call for bulk retrieveing of pids in binary form.
> >
> > Using /proc is slower than necessary: 3 syscalls + another 3 for each 
> > thread +
> > converting with atoi() + instantiating dentries and inodes.
> >
> > /proc may be not mounted especially in containers. Natural extension of
> > hidepid=2 efforts is to not mount /proc at all.
> >
> > It could be used by programs like ps, top or CRIU. Speed increase will
> > become more drastic once combined with bulk retrieval of process statistics.
> >
> > Benchmark:
> >
> >         N=1<<16 times
> >         ~130 processes (~250 task_structs) on a regular desktop system
> >         opendir + readdir + closedir /proc + the same for every 
> > /proc/$PID/task
> >         (roughly what htop(1) does) vs pidmap
> >
> >         /proc 16.80 ± 0.73%
> >         pidmap 0.06 ± 0.31%
> >
> > PIDMAP_* flags are modelled after /proc/task_diag patchset.
> >
> >
> > PIDMAP(2)                  Linux Programmer's Manual                 
> > PIDMAP(2)
> >
> > NAME
> >        pidmap - get allocated PIDs
> >
> > SYNOPSIS
> >        long pidmap(pid_t pid, int *pids, unsigned int count , unsigned int 
> > start, int flags);
> 
> I think we will seriously regret a syscall that does this.  Djalal is
> working on fixing the turd that is hidepid, and this syscall is
> basically incompatible with ever fixing hidepids.  I think that, to
> make it less regrettable, it needs to take an fd to a proc mount as a
> parameter.  This makes me wonder why it's a syscall at all -- why not
> just create a new file like /proc/pids?

See reply to fdmap(2).

pidmap(2) is indeed more complex case exactly because of
pid/tgid/tid/everything else + pidnamespaces + ->hide_pid.
However the problem remains: query task tree without all the bullshit.
C/R people succumbed with /proc/*/children, it was a mistake IMO.

Reply via email to