Re: [akaros] Code Review - New Akaros profiler

'Davide Libenzi' via Akaros Thu, 12 Nov 2015 14:02:07 -0800

On Thu, Nov 12, 2015 at 12:02 PM, Barret Rhoden <[email protected]>
wrote:


> > diff --git a/kern/include/profiler.h b/kern/include/profiler.h
> > new file mode 100644
> > index 000000000000..2ec9d4308632
> > --- /dev/null
> > +++ b/kern/include/profiler.h
>
> New, non-trivial files need a copyright header.
>

Already has, if you reviewed the tip of the branch.



>
> > @@ -0,0 +1,18 @@
> > +
> > +#ifndef ROS_KERN_INC_PROFILER_H
> > +#define ROS_KERN_INC_PROFILER_H
>
> Can also do the pragma once now.
>

Already has.



>
> > diff --git a/kern/src/profiler.c b/kern/src/profiler.c
> > new file mode 100644
> > index 000000000000..ca7197bbc20b
> > --- /dev/null
> > +++ b/kern/src/profiler.c
>
> Needs a copyright.  It looks like this might be new code, but if there's
> old stuff from oprofile in it, we need to put that.  I see there are a
> couple things like "op_entry" still around.
>

Already has.




>
> > +static inline int profiler_add_sample(struct profiler_cpu_context
> *cpu_buf,
> > +
>  uintptr_t pc, unsigned long event)
>
> Do you want this to return a bool (TRUE for success) or an int (0 for
> success)?   See below:
>

This function is gone on tip.
Since this was v2, and v1 was never reviewed, you likely commented the v1
version.



> > From 0a919619324782e873ddf5bbf9bd19e989f25162 Mon Sep 17 00:00:00 2001
> > From: Davide Libenzi <[email protected]>
> > Date: Tue, 20 Oct 2015 16:34:26 -0700
> > Subject: Do not race when multiple init happen at the same time
>
> > @@ -65,53 +64,63 @@ static inline size_t
> profiler_cpu_buffer_add_data(struct op_entry *entry,
> >  static void free_cpu_buffers(void)
> >  {
> >       kfree(profiler_percpu_ctx);
> > +     profiler_percpu_ctx = NULL;
> > +
> > +     qclose(profiler_queue);
> > +     profiler_queue = NULL;
> >  }
>
> Isn't this stuff possibly used from IRQ context?  What's the plan for
> cleaning up such that we are sure that there are no interrupts going off
> while you are freeing things?
>

That function is only called when there are no more users.
Timers are stopped before calling profiler_cleanup().
Not sure how sync is removal of a timer.



>
> In:
> >  static int alloc_cpu_buffers(void)
> >  {
> > -     if (!profiler_queue) {
> > -             profiler_queue = qopen(profiler_queue_limit, 0, NULL,
> NULL);
> > -             if (!profiler_queue)
> > -                     goto fail;
> > -     }
> > +     int i;
> > +
> > +     profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> > +     if (!profiler_queue)
> > +             return -ENOMEM;
> >
> > -     /* we *really* don't want to block. Losing data is better. */
> >       qdropoverflow(profiler_queue, 1);
>
> Then later:
>
> > +     qnonblock(profiler_queue, 1);
>
> This is fragile.  qnonblock means "error with EAGAIN if we're full".
> Right now, you'll get by, since you still have qdropoverflow above.  But
> this could break with an error() thrown from IRQ context somewhere down
> the line.
>

Yes, there are paired. I am not sure which kind of bad condition can happen.



>
> >  void profiler_cleanup(void)
> >  {
> > -     free_cpu_buffers();
> > +     sem_down(&mtx);
> > +     profiler_users--;
> > +     if (profiler_users == 0)
> > +             free_cpu_buffers();
> > +     sem_up(&mtx);
> >  }
>
> This might be a candidate for a kref, depending on how cleanup ends up
> working.  No need to change it now though.
>

kref don't block until the 0->1 transitioner is initializing, no?



> > From fd096815dc200a550ea5e6b6d7f133df75e29ed9 Mon Sep 17 00:00:00 2001
> > From: Davide Libenzi <[email protected]>
> > Date: Thu, 22 Oct 2015 14:30:26 -0700
> > Subject: Added full binary path into the proc structure
> >
>
> > @@ -791,6 +763,7 @@ static int sys_exec(struct proc *p, char *path,
> size_t path_l,
> >       t_path = copy_in_path(p, path, path_l);
> >       if (!t_path)
> >               return -1;
> > +     proc_replace_binary_path(p, t_path);
>
> If we error out after this point, we'll have changed the path of the
> binary to
> the new one even though the exec failed.  I'm okay with that, just more of
> an
> FYI.
>

I am fixing this.




>
> > From 47bf34b1bc96b78a8091726cb3016b1f0d964847 Mon Sep 17 00:00:00 2001
> > From: Davide Libenzi <[email protected]>
> > Date: Thu, 22 Oct 2015 10:56:37 -0700
> > Subject: Added APIs to access process startup information
> >
> > +char *get_startup_argv(struct proc *p, size_t idx, char *argp,
> > +                                        size_t max_size)
> > +{
> > +     size_t stack_space = (const char *) USTACKTOP - (const char *)
> p->args_base;
> > +     const char *sptr = (const char *) p->args_base + sizeof(size_t) +
> > +             idx * sizeof(char *);
> > +     const char *argv = NULL;
> > +
> > +     /* TODO,DL: Use copy_from_user() when available.
> > +      */
>
> You've got these now, though memcpy_from_user will do it. (or will soon).
> Either way, you don't need the TODO.
>

The strcpy is needed. Currently it copies up to max size, which is kind of
a hack.
But I will not be fixing it here (in this CR, that is).




>
> > +     if (memcpy_from_user(p, &argv, sptr, sizeof(char *)))
> > +             return NULL;
> > +
> > +     /* TODO,DL: Use strncpy_from_user() when available.
> > +      */
> > +     max_size = MIN(max_size, stack_space);
> > +     if (memcpy_from_user(p, argp, argv, max_size))
> > +             return NULL;
> > +     argp[max_size - 1] = 0;
> > +
> > +     return argp;
> > +}
>
> How important is it that the command line data hasn't changed?  We're
> trusting
> the user to not have changed it since they were invoked.


> Also, I guess we're going to want to parse the argv stuff later.  Kevin put
> together some code to pack and parse the args earlier.  It might be easier
> and
> to keep that blob around and parse it instead of this.  Not a huge deal
> though.
>

We could keep the kernel buffer copy, but I'd prefer to eventually do that
in a separate CL.




> > From 1aede03303debfe3075aa7055c6f1dd843f806bc Mon Sep 17 00:00:00 2001
> > From: Davide Libenzi <[email protected]>
> > Date: Tue, 27 Oct 2015 15:29:10 -0700
> > Subject: Added API to enumerate the VM regions of a process
>
> > --- a/kern/include/mm.h
> > +++ b/kern/include/mm.h
> > @@ -51,6 +51,9 @@ void isolate_vmrs(struct proc *p, uintptr_t va, size_t
> len);
> >  void unmap_and_destroy_vmrs(struct proc *p);
> >  int duplicate_vmrs(struct proc *p, struct proc *new_p);
> >  void print_vmrs(struct proc *p);
> > +void enumerate_vrms(struct proc *p,
>        ^enumerate_vmrs
>

Fixed. Took me a little to understand what was wrong ☺




> > diff --git a/kern/src/mm.c b/kern/src/mm.c
> > index ca91e63620b6..06a349840404 100644
> > --- a/kern/src/mm.c
> > +++ b/kern/src/mm.c
> > @@ -359,6 +359,18 @@ void print_vmrs(struct proc *p)
> >                      vmr->vm_file, vmr->vm_foff);
> >  }
> >
> > +void enumerate_vrms(struct proc *p,
>        ^enumerate_vmrs
>

Ditto.



>
> > From 2df52e6bc39bf90a1c90e5382298fa2c2a8316f5 Mon Sep 17 00:00:00 2001
> > From: Davide Libenzi <[email protected]>
> > Date: Thu, 5 Nov 2015 17:30:24 -0800
> > Subject: Added API to append data into a circular buffer
> >
> > Added API to append data into a circular buffer. The buffer data is
> > allocated once at init time, and no more allocations happen after that.
>
> > diff --git a/kern/include/circular_buffer.h
> b/kern/include/circular_buffer.h
> > new file mode 100644
> > index 000000000000..4ab8c42e2673
> > --- /dev/null
> > +++ b/kern/include/circular_buffer.h
>
> Can you put in a little info about how to use this and what the rules
> are?  For
> instance, what happens to writers when the buffer is full (overwrite,
> block, or
> drop?) or to readers when the buffer is empty? (looks like you just get
> 0).  If
> a reader uses an offset, does that just drop a chunk from the buffer?
> What is
> the cbuf_size_t for?
>

Locking is up to the reader.
In a circular buffer fashion, a write on full buffer discard oldest buffer,
and write new one.
No, reading data does not remove data. The only way the read ptr is moved
forward, is when the write ptr pushes it forward in order to fit new data.
Every block in the buffer, is pre-pended with its size. That is the type of
the size.




>
> > --- /dev/null
> > +++ b/kern/lib/circular_buffer.c
>
> > +void circular_buffer_clear(struct circular_buffer *cb)
> > +{
> > +     if (cb->base) {
> > +             if (cb->mem)
> > +                     kfree(cb->mem);
>
> It's a little surprising that clear also frees the allocation, instead of
> just
> resetting everything back to 0.
>

This clears all the resources of the buffer.
I used "clear" as there is an "init".
I would have used "free" if there was an "alloc".




>
> > +             cb->rdptr = cb->wrptr = cb->base = cb->mem = NULL;
> > +             cb->size = cb->allocated = 0;
> > +     }
> > +}
> > +
> > +static bool circular_buffer_is_overlap(const struct circular_buffer *cb,
> > +
> const char *rptr, const char *wptr,
> > +
> size_t size)
> > +{
> > +     return (cb->size > 0) && (rptr >= wptr) && (rptr < (wptr + size));
> > +}
> > +
> > +static void circular_buffer_write_skip(struct circular_buffer *cb, char
> *wrptr,
> > +
> size_t size)
>
> What are these helpers doing?
>

The overlap check is a write operation is going to overwrite the block at
which rdptr resides.
The skip, moves forward the rdptr until it no more overlaps with the
incoming write.




>
> > +size_t circular_buffer_read(struct circular_buffer *cb, char *data,
> size_t size,
> > +                                                     size_t off)
> > +{
> > +     size_t asize = cb->size, rsize = 0;
> > +     const char *rdptr = cb->rdptr;
> > +
> > +     while (asize > 0 && size > 0) {
> > +             size_t bsize = *(const cbuf_size_t *) rdptr;
> > +
> > +             if (likely(bsize)) {
> > +                     size_t esize = bsize - sizeof(cbuf_size_t);
> > +
> > +                     if (off >= esize) {
> > +                             off -= esize;
> > +                     } else {
> > +                             size_t csize = MIN(esize - off, size);
> > +
> > +                             memcpy(data, rdptr + sizeof(cbuf_size_t) +
> off, csize);
>
> So every block of data has the size first, then the data, all packed
> together?
>

Yup.




>
> > From c3c2bb0701f747e5c5c3f3d30cdf033137ae26e2 Mon Sep 17 00:00:00 2001
> > From: Davide Libenzi <[email protected]>
> > Date: Wed, 21 Oct 2015 16:39:04 -0700
> > Subject: Implemented the new profiler
> >
> > Implemented the new profiler format and added simple userspace
> > stack trace (waiting for copy_from_user()).
>
> This is a huge commit that would have been easier to understand if it was
> broken
> up a bit.  For instance, I see there's a user backtrace list, syscall
> tracing
> isn't using a qio queue anymore, the old Plan 9 profiler is gone (great!),
> and a
> bunch of other things.  It also looks like you squeezed in the change that
> the
> start command also turns on the timers for all cores, which isn't
> something I
> wanted.
>

Yes, it is a big change 😑
Braking it down would be a PITA at this point, since accumulated changes in
the last 4 weeks that has been sitting there.
As for the timers, for Ron's case, you can still turn them on aside from
the profiler.
For a profiler POV, I think the default behavior should be them on
everywhere, otherwise you lose the full picture of what the system is doing.
We could add a selective CPU timer and profiler turn on, but IMO the
default should be all on.




>
> > --- a/kern/arch/x86/kdebug.c
> > +++ b/kern/arch/x86/kdebug.c
> > @@ -365,6 +365,25 @@ size_t backtrace_list(uintptr_t pc, uintptr_t fp,
> uintptr_t *pcs,
> >       return nr_pcs;
> >  }
> >
> > +size_t user_backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
> > +                                                size_t nr_slots)
> > +{
> > +     size_t nr_pcs = 0;
> > +
> > +     for (;;) {
> > +             if (unlikely(fp >= UMAPTOP) || unlikely(fp < BRK_END) ||
>
> Why not is_user_raddr()?  And once you use copy_from, those checks aren't
> needed
> either.  Also, I think it is possible for frame pointers to be below
> BRK_END,
> which could happen if a user stack was malloc'd instead of mmapped.
>
> > +                     unlikely(nr_pcs >= nr_slots))
> > +                     break;
> > +
> > +             /* For now straight memory access, waiting for
> copy_from_user(). */
>
> copy_from is now available.  =)
>

I will do that is a separate CL.



> diff --git a/kern/drivers/dev/kprof.c b/kern/drivers/dev/kprof.c
> > index da11b7e44d32..76e53d7f9d30 100644
>
> > +static int oprof_timer_period = 1000;
>              ^ rename prof_timer_period
>

Done.




>
> > +static void kprof_start_profiler(void)
> > +{
> > +     ERRSTACK(2);
> >
> > +     sem_down(&kprof.lock);
>
> If you're using a semaphore as a binary mutex (lock), please use a qlock.
> This
> happens a bunch in this patch set.
>

Done.



>
> > +     if (waserror()) {
> > +             sem_up(&kprof.lock);
> > +             nexterror();
> > +     }
> > +     if (!kprof.profiling) {
> > +             profiler_init();
> > +             if (waserror()) {
> > +                     profiler_cleanup();
> > +                     nexterror();
> > +             }
> > +
> > +             profiler_control_trace(1);
> > +
> > +             for (int i = 0; i < num_cores; i++)
> > +                     kprof_enable_timer(i, 1);
>
> This looks like starting the profiler turns on the timer.  I wanted to
> keep the
> timer separate from the collection, such that the timer is one of
> potentially
> many sources of data, as discussed in another email.
>

For future uses, yes. But today, with no timers, there is no data.
kptrace is a different data path, which is always on.
When we do, in the code I am working now, counter overflow based interrupt
triggers, then we need to rework this file.





>
> > +static void kprof_usage_fail(void)
> > +{
> > +     static const char *ctlstring = "clear|start|stop|timer";
> > +     const char* const * cmds = profiler_configure_cmds();
> > +     size_t i, size, csize;
> > +     char msgbuf[128];
> > +
> > +     strncpy(msgbuf, ctlstring, sizeof(msgbuf));
> > +     size = MIN(strlen(ctlstring), sizeof(msgbuf) - 1);
> > +     for (i = 0; cmds[i]; i++) {
> > +             csize = strlen(cmds[i]);
> > +             if ((csize + size + 2) > sizeof(msgbuf))
> > +                     break;
> > +             msgbuf[size] = '|';
> > +             memcpy(msgbuf + size + 1, cmds[i], csize);
> > +             size += csize + 1;
> > +     }
> > +     msgbuf[size] = 0;
> > +
> > +     error(EFAIL, msgbuf);
> > +}
>
> This is cool, and the pattern tends to come up a lot, where we'd like to
> generate usage information.  Perhaps there's a Plan 9 helper for this
> already?
> (I don't know off the top of my head, maybe the Plan 9 guys do).  We
> already
> have stuff that parses command strings.  If we don't have a helper, we can
> make
> one, so that every device doesn't have to redo this logic.
>
> Another part to it is that a device has several layers of arguments and
> commands
> - it'd be neat if that was clearly written down somewhere and then we could
> generate these switches and error messages more easily.  This isn't
> something
> that needs to be done now, but I like your kprof_usage_fail and it brought
> the
> topic up.  =)
>

OK ☺




>
> > +static struct trace_printk_buffer *kprof_get_printk_buffer(void)
> > +{
> > +     static struct trace_printk_buffer boot_tpb;
> > +     static struct trace_printk_buffer *cpu_tpbs;
> > +
> > +     if (unlikely(booting))
> > +             return &boot_tpb;
> > +     if (unlikely(!cpu_tpbs)) {
> > +             /* Poor man per-CPU data structure. I really do no like
> littering global
> > +              * data structures with module specific data.
> > +              */
> > +             spin_lock_irqsave(&ktrace_lock);
> > +             if (!cpu_tpbs)
> > +                     cpu_tpbs = kzmalloc(num_cores * sizeof(struct
> trace_printk_buffer),
> > +                                                             0);
>
> If this alloc fails, (flags == 0), then we'll return 0 (+coreid) and PF
> later.
>

We could panic, or return boot tpb.
Today a OOM crashes the box, so I am not sure what to do here.



> --- a/kern/include/kdebug.h
> > +++ b/kern/include/kdebug.h
> > @@ -46,7 +48,7 @@ int printdump(char *buf, int buflen, uint8_t *data);
> >  extern bool printx_on;
> >  void set_printx(int mode);
> >  #define printx(args...) if (printx_on) printk(args)
> > -#define trace_printx(args...) if (printx_on) trace_printk(args)
> > +#define trace_printx(args...) if (printx_on) trace_printk(TRUE, args)
>                                                             ^FALSE?
> We just want to print text, not a backtrace.  That was also my intent for
> trace_printk (it's just the print, not a backtrace).
>

Seems pretty useful to have the option to get a BT, to understand from
which path you are coming from.
A "grep" of the stirng simply tells where the print is.



>
> > --- a/kern/src/mm.c
> > +++ b/kern/src/mm.c
> > @@ -24,6 +24,7 @@
> >  #include <kmalloc.h>
> >  #include <vfs.h>
> >  #include <smp.h>
> > +#include <profiler.h>
> >
> >  struct kmem_cache *vmr_kcache;
> >
> > @@ -692,6 +693,9 @@ void *do_mmap(struct proc *p, uintptr_t addr, size_t
> len, int prot, int flags,
> >               }
> >       }
> >       spin_unlock(&p->vmr_lock);
> > +
> > +     profiler_notify_mmap(p, addr, len, prot, flags, file, offset);
> > +
>
> Do you need to know when something was unmapped?
>

Linux perf don't seem to care, as it does not have any command to report
that.


> diff --git a/kern/src/process.c b/kern/src/process.c
> > index bd26789e3ef1..2e4c69518e8a 100644
> > --- a/kern/src/process.c
> > +++ b/kern/src/process.c
> > @@ -331,6 +331,8 @@ error_t proc_alloc(struct proc **pp, struct proc
> *parent, int flags)
> >               kmem_cache_free(proc_cache, p);
> >               return -ENOFREEPID;
> >       }
> > +     if (parent && parent->binary_path)
> > +             kstrdup(&p->binary_path, parent->binary_path);
>
> Is this to fix a leak where paths weren't set for forked processes?  If
> so, this
> is an example of a fixup that could have been squashed into a previous
> commit.
>

I lost track for what it was 😀
But yes, could have been possibly squashed.




>
> > --- a/kern/src/profiler.c
> > +++ b/kern/src/profiler.c
>
> > +static inline char* vb_encode_uint64(char* data, uint64_t n)
> > +{
> > +     for (; n >= 0x80; n >>= 7)
> > +             *data++ = (char) (n | 0x80);
> > +     *data++ = (char) n;
> > +
> > +     return data;
> > +}
>
> This probably should be in a library or header (so we can reuse it) and
> with a
> little explanation.  It looks like we take 7 bits of n at a time and push
> it
> into *data, with bit 0x80 signifying we're done or not.
>

Yep, that's what t does.
I wasn't sure if variable byte encoding would be something a kernel would
be doing elsewhere.



> +static struct block *profiler_buffer_write(struct profiler_cpu_context
> *cpu_buf,
> > +
>         struct block *b)
> >  {
> > -     return (((uint64_t) 0xee01) << 48) | ((uint64_t) cpu << 16) |
> > -             (uint64_t) nbt;
> > +     if (b) {
> > +             qibwrite(profiler_queue, b);
> > +
> > +             if (qlen(profiler_queue) > profiler_queue_limit) {
> > +                     b = qget(profiler_queue);
> > +                     if (likely(b)) {
> > +                             cpu_buf->dropped_data_size += BLEN(b);
> > +                             freeb(b);
> > +                     }
>
> This seems like a candidate for a feature added to qio, such as a "drop
> from the
> front" mode.
>

OK, but can we please do that in a separate CL? This branch has been
sitting for some time already.




>
> > +             }
> > +     }
> > +
> > +     return iallocb(profiler_cpu_buffer_size);
> >  }
>
> This function seems a little weird.  If you're given a block, you write it.
> That makes sense.  But regardless, you alloc a new block?
>
> Seems like we should not be too cute by having write return a fresh block,
> and
> just do the allocation separately.
>

I drop a block from the tail. I *could* be reusing it, but I have found no
reinit API to re initialized a block.




>
> > +static void profiler_emit_current_system_status(void)
> >  {
> > -     struct block *b = cpu_buf->block;
> > -    size_t totalsize = sizeof(struct op_sample) +
> > -             size * sizeof(entry->sample->data[0]);
> > -
> > -     if (unlikely((!b) || (b->lim - b->wp) < totalsize)) {
> > -             if (b)
> > -                     qibwrite(profiler_queue, b);
> > -             /* For now. Later, we will grab a block off the
> > -              * emptyblock queue.
> > -              */
> > -             cpu_buf->block = b = iallocb(profiler_cpu_buffer_size);
> > -        if (unlikely(!b)) {
> > -                     printk("%s: fail\n", __func__);
> > -                     return NULL;
> > +     void enum_proc(struct vm_region *vmr, void *opaque)
> > +     {
> > +             struct proc *p = (struct proc *) opaque;
> > +
> > +             profiler_notify_mmap(p, vmr->vm_base, vmr->vm_end -
> vmr->vm_base,
> > +                                                      vmr->vm_prot,
> vmr->vm_flags, vmr->vm_file,
> > +                                                      vmr->vm_foff);
> > +     }
> > +
> > +     ERRSTACK(2);
>
> I think you only need 1 here.  It's based on the number of waserrors.
>

Fixed.




>
> > +     struct process_set pset;
> > +
> > +     proc_get_set(&pset);
> > +     if (waserror()) {
> > +             proc_free_set(&pset);
> > +             nexterror();
> > +     }
> > +
> > +     for (size_t i = 0; i < pset.num_processes; i++)
> > +             enumerate_vrms(pset.procs[i], enum_proc, pset.procs[i]);
>                  ^vmrs
>

Done.



>
> > +     poperror();
> > +     proc_free_set(&pset);
> > +}
> > +
> > +static inline int profiler_is_tracing(struct profiler_cpu_context
> *cpu_buf)
>
> Since this is named "is_tracing", it should be a bool, and return
> TRUE/False to
> avoid any confusion.
>

Done.



> -static inline int profiler_add_sample(struct profiler_cpu_context
> *cpu_buf,
> > -
>  uintptr_t pc, unsigned long event)
> > +static void alloc_cpu_buffers(void)
> >  {
> >       ERRSTACK(1);
> > -     struct op_entry entry;
> > -     struct block *b;
> > +     int i;
> >
> > +     profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> > +     if (!profiler_queue)
> > +             error(-ENOMEM, NULL);
>               ^just ENOMEM
>

Done.


> +     qdropoverflow(profiler_queue, 1);
> > +     qnonblock(profiler_queue, 1);
>
> As mentioned in a previous patch, you might not want qnonblock.
>

Not sure why not?



Incidentally, qnonblock and qdropoverflow take a bool, not an int.  So TRUE
> instead of 1.
>

Done.



>
> > +     profiler_percpu_ctx =
> > +             kzmalloc(sizeof(*profiler_percpu_ctx) * num_cores,
> KMALLOC_WAIT);
> > +     if (!profiler_percpu_ctx)
> > +             error(-ENOMEM, NULL);
>
> Since you did a KMALLOC_WAIT, you will never have !profiler_percpu_ctx
>

I am not sure waiting is a guarantee that you will get memory anyway.


> -static inline void profiler_begin_trace(struct profiler_cpu_context
> *cpu_buf)
> > +int profiler_configure(struct cmdbuf *cb)
> >  {
> > -     cpu_buf->tracing = 1;
> > +     if (!strcmp(cb->f[0], "prof_qlimit")) {
> > +             if (cb->nf < 2)
> > +                     error(EFAIL, "prof_qlimit KB");
> > +             profiler_queue_limit = atoi(cb->f[1]) * 1024;
>
> Probably want a sanity check.  Also, what happens when this changes after
> the
> queue was already allocated?
>

The configuration must be done before the profiler is started.




>
> > +     } else if (!strcmp(cb->f[0], "prof_cpubufsz")) {
> > +             if (cb->nf < 2)
> > +                     error(EFAIL, "prof_cpubufsz KB");
> > +             profiler_cpu_buffer_size = atoi(cb->f[1]) * 1024;
>
> Is there any danger with the user setting this to be a very small value
> (like
> 0)?   It looks like the assumption in profiler_cpu_buffer_write_reserve()
> is
> that a fresh allocation (done by profiler_buffer_write()) is enough for
> size
> bytes.
>
> > +     } else if (!strcmp(cb->f[0], "prof_btdepth")) {
> > +             if (cb->nf < 2)
> > +                     error(EFAIL, "prof_btdepth DEPTH");
> > +             profiler_backtrace_depth = atoi(cb->f[1]);
>
> It is dangerous to have the user control this.  It's a stack allocation.
> Even
> if it wasn't, we'd need a sanity check of some sort.
>

I will drop the BT depth config, and limit the others.



>
> > +void profiler_cleanup(void)
> > +{
> > +     sem_down(&profiler_mtx);
> > +     profiler_users--;
> > +     if (profiler_users == 0)
> > +             free_cpu_buffers();
> > +     sem_up(&profiler_mtx);
> >  }
>
> I'm still concerned about this.  If the only source of profiling data is
> from
> the timer IRQs, then your current stuff is seems fine.  (You disable the
> timers,
> and thus the handlers) before freeing this stuff).  But if we ever have any
> other use of this, then we'll need to be careful.
>

As I said, when I come in with the overflow-triggered sampling, this will
need some retuning.



>
> But there's more to it than that.  The assumption here is that once
> profiler_users == 0, then there is no code, (RKM, pending IRQ, etc.) that
> has
> access to profiler_percpu_ctx or profiler_queue.
>
> Just as an example, say someone is calling profiler_notify_mmap(), which
> checks
> profiler_percpu_ctx.  They get past the if () check, then concurrently
> someone
> calls profiler_cleanup and triggers free_cpu_buffers.  Then the original
> thread
> eventually calls profiler_push_pid_mmap, then qwrite, then page faults.
>
> So a user could theoretically trigger a PF.  This is part of the reason
> why I
> was reluctant to have you try and free the buffers.
>

Well, we have no RCU ☺
In such case we should swap ptr with NULL, and RCU-free it.




> @@ -572,6 +571,7 @@ static int sys_proc_create(struct proc *p, char
> *path, size_t path_l,
> >       user_memdup_free(p, kargenv);
> >       __proc_ready(new_p);
> >       pid = new_p->pid;
> > +     profiler_notify_new_process(new_p);
> >       proc_decref(new_p);     /* give up the reference created in
> proc_create() */
> >       return pid;
> >  error_load_elf:
> > @@ -728,6 +728,7 @@ static ssize_t sys_fork(env_t* e)
> >
> >       printd("[PID %d] fork PID %d\n", e->pid, env->pid);
> >       ret = env->pid;
> > +     profiler_notify_new_process(env);
> >       proc_decref(env);       /* give up the reference created in
> proc_alloc() */
> >       return ret;
> >  }
>
> Do you need to update things when a process changes its binary path?
> (exec)
>

That should lead to new mmaps.



> From c0ab4ec0d729ad8e0555b3beef72340f90c23712 Mon Sep 17 00:00:00 2001
> > From: Davide Libenzi <[email protected]>
> > Date: Sat, 7 Nov 2015 18:47:16 -0800
> > Subject: Enabled /prof/kptrace collection of anything which goes into
> cprintf
> >
> > Enabled /prof/kptrace collection of anything which goes into cprintf,
> > printk, and its associates.
> > The kptrace collector is a circular buffer whose default size is 128KB.
>
> What happens when the ring is full?  New stuff drops?  My main concern
> here is
> that trace_vprintk could somehow cause an issue and hold up the real
> printk,
> which would be hard to debug.
>

Ring full drops old stuff. Never blocks.




>
> > --- a/kern/drivers/dev/kprof.c
> > +++ b/kern/drivers/dev/kprof.c
> > @@ -19,6 +19,7 @@
> >  #include <error.h>
> >  #include <pmap.h>
> >  #include <smp.h>
> > +#include <time.h>
> >  #include <circular_buffer.h>
> >  #include <umem.h>
> >  #include <profiler.h>
> > @@ -62,7 +63,7 @@ struct dirtab kproftab[] = {
> >       {"mpstat-raw",  {Kmpstatrawqid},        0,      0600},
> >  };
> >
> > -extern int booting;
> > +extern system_timing_t system_timing;
>
> Should be able to get system_timing from a header.
>

Yup, arch/time.h, done.




>
> >  static struct kprof kprof;
> >  static bool ktrace_init_done = FALSE;
> >  static spinlock_t ktrace_lock = SPINLOCK_INITIALIZER_IRQSAVE;
> > @@ -567,7 +568,7 @@ static struct trace_printk_buffer
> *kprof_get_printk_buffer(void)
> >       static struct trace_printk_buffer boot_tpb;
> >       static struct trace_printk_buffer *cpu_tpbs;
> >
> > -     if (unlikely(booting))
> > +     if (unlikely(!num_cores))
> >               return &boot_tpb;
>
> That seems a little odd.  I'd think if we're still booting, we'd use the
> boot_tpb.  Was there some corner case that triggered this?  I understand
> this
> one:
>
> > -     if (likely(!booting))
> > +     if (likely(system_timing.tsc_freq))
> >               tsc2timespec(read_tsc(), &ts_now);
>

The booting flag is cleared really late, and I would not want stuff
starting spinning on other cores, colliding on boot tbp.

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [akaros] Code Review - New Akaros profiler

Reply via email to