Hi -

Overall, I'm a big fan of this patch set.  But I do have a bunch of
concerns.  Some minor, some bigger.  See below.

Thanks,

Barret


On 2015-10-28 at 15:06 "'Davide Libenzi' via Akaros"
<[email protected]> wrote:
> Since the previous one has not been done yet, this is a cumulative
> one:
> 
> https://github.com/brho/akaros/compare/master...dlibenzi:profiler_dev_v2
> 
> 
> The following changes since commit
> 6f3723cd8f883260a78fdf411911d7469464caa5:
> 
>   Update file-posix.c utest (2015-10-15 12:07:00 -0400)
> 
> are available in the git repository at:
> 
>   [email protected]:dlibenzi/akaros profiler_dev_v2
> 
> for you to fetch changes up to
> 6e40b75927c808dd945cb3ad0ee6e52a5a2dc495:
> 
>   Added simple netcat client for Akaros (2015-10-28 14:49:56 -0700)

> From 2ba13e8925bd0524a13748beb71beb797e79831f Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Tue, 20 Oct 2015 15:53:08 -0700
> Subject: Completely restructured profiler code cutting all the unused code

> diff --git a/kern/include/profiler.h b/kern/include/profiler.h
> new file mode 100644
> index 000000000000..2ec9d4308632
> --- /dev/null
> +++ b/kern/include/profiler.h

New, non-trivial files need a copyright header.

> @@ -0,0 +1,18 @@
> +
> +#ifndef ROS_KERN_INC_PROFILER_H
> +#define ROS_KERN_INC_PROFILER_H

Can also do the pragma once now.

> diff --git a/kern/src/profiler.c b/kern/src/profiler.c
> new file mode 100644
> index 000000000000..ca7197bbc20b
> --- /dev/null
> +++ b/kern/src/profiler.c

Needs a copyright.  It looks like this might be new code, but if there's
old stuff from oprofile in it, we need to put that.  I see there are a
couple things like "op_entry" still around.

> +static inline int profiler_add_sample(struct profiler_cpu_context *cpu_buf,
> +                                                                       
> uintptr_t pc, unsigned long event)

Do you want this to return a bool (TRUE for success) or an int (0 for
success)?   See below:

> +{
> +     ERRSTACK(1);
> +     struct op_entry entry;
> +     struct block *b;
> +
> +     if (waserror()) {
> +             poperror();
> +             printk("%s: failed\n", __func__);
> +             return 1;

error, value == 1

> +     }
> +
> +     b = profiler_cpu_buffer_write_reserve(cpu_buf, &entry, 0);
> +     if (likely(b)) {
> +             entry.sample->hdr = profiler_create_header(core_id(), 1);
> +             entry.sample->event = (uint64_t) event;
> +             profiler_cpu_buffer_add_data(&entry, &pc, 1);
> +     }
> +     poperror();
> +
> +     return b == NULL;

success, value == 1.

> From 0a919619324782e873ddf5bbf9bd19e989f25162 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Tue, 20 Oct 2015 16:34:26 -0700
> Subject: Do not race when multiple init happen at the same time

> @@ -65,53 +64,63 @@ static inline size_t profiler_cpu_buffer_add_data(struct 
> op_entry *entry,
>  static void free_cpu_buffers(void)
>  {
>       kfree(profiler_percpu_ctx);
> +     profiler_percpu_ctx = NULL;
> +
> +     qclose(profiler_queue);
> +     profiler_queue = NULL;
>  }

Isn't this stuff possibly used from IRQ context?  What's the plan for
cleaning up such that we are sure that there are no interrupts going off
while you are freeing things?

In:
>  static int alloc_cpu_buffers(void)
>  {
> -     if (!profiler_queue) {
> -             profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> -             if (!profiler_queue)
> -                     goto fail;
> -     }
> +     int i;
> +
> +     profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> +     if (!profiler_queue)
> +             return -ENOMEM;
>  
> -     /* we *really* don't want to block. Losing data is better. */
>       qdropoverflow(profiler_queue, 1);

Then later:

> +     qnonblock(profiler_queue, 1);

This is fragile.  qnonblock means "error with EAGAIN if we're full".
Right now, you'll get by, since you still have qdropoverflow above.  But
this could break with an error() thrown from IRQ context somewhere down
the line.

>  void profiler_cleanup(void)
>  {
> -     free_cpu_buffers();
> +     sem_down(&mtx);
> +     profiler_users--;
> +     if (profiler_users == 0)
> +             free_cpu_buffers();
> +     sem_up(&mtx);
>  }

This might be a candidate for a kref, depending on how cleanup ends up
working.  No need to change it now though.
> From fd096815dc200a550ea5e6b6d7f133df75e29ed9 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Thu, 22 Oct 2015 14:30:26 -0700
> Subject: Added full binary path into the proc structure
> 

> @@ -791,6 +763,7 @@ static int sys_exec(struct proc *p, char *path, size_t 
> path_l,
>       t_path = copy_in_path(p, path, path_l);
>       if (!t_path)
>               return -1;
> +     proc_replace_binary_path(p, t_path);

If we error out after this point, we'll have changed the path of the binary to
the new one even though the exec failed.  I'm okay with that, just more of an
FYI.

> From 47bf34b1bc96b78a8091726cb3016b1f0d964847 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Thu, 22 Oct 2015 10:56:37 -0700
> Subject: Added APIs to access process startup information
> 
> +char *get_startup_argv(struct proc *p, size_t idx, char *argp,
> +                                        size_t max_size)
> +{
> +     size_t stack_space = (const char *) USTACKTOP - (const char *) 
> p->args_base;
> +     const char *sptr = (const char *) p->args_base + sizeof(size_t) +
> +             idx * sizeof(char *);
> +     const char *argv = NULL;
> +
> +     /* TODO,DL: Use copy_from_user() when available.
> +      */

You've got these now, though memcpy_from_user will do it. (or will soon).
Either way, you don't need the TODO.

> +     if (memcpy_from_user(p, &argv, sptr, sizeof(char *)))
> +             return NULL;
> +
> +     /* TODO,DL: Use strncpy_from_user() when available.
> +      */
> +     max_size = MIN(max_size, stack_space);
> +     if (memcpy_from_user(p, argp, argv, max_size))
> +             return NULL;
> +     argp[max_size - 1] = 0;
> +
> +     return argp;
> +}

How important is it that the command line data hasn't changed?  We're trusting
the user to not have changed it since they were invoked.

Also, I guess we're going to want to parse the argv stuff later.  Kevin put
together some code to pack and parse the args earlier.  It might be easier and
to keep that blob around and parse it instead of this.  Not a huge deal though.
> From 1aede03303debfe3075aa7055c6f1dd843f806bc Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Tue, 27 Oct 2015 15:29:10 -0700
> Subject: Added API to enumerate the VM regions of a process

> --- a/kern/include/mm.h
> +++ b/kern/include/mm.h
> @@ -51,6 +51,9 @@ void isolate_vmrs(struct proc *p, uintptr_t va, size_t len);
>  void unmap_and_destroy_vmrs(struct proc *p);
>  int duplicate_vmrs(struct proc *p, struct proc *new_p);
>  void print_vmrs(struct proc *p);
> +void enumerate_vrms(struct proc *p,
       ^enumerate_vmrs

> diff --git a/kern/src/mm.c b/kern/src/mm.c
> index ca91e63620b6..06a349840404 100644
> --- a/kern/src/mm.c
> +++ b/kern/src/mm.c
> @@ -359,6 +359,18 @@ void print_vmrs(struct proc *p)
>                      vmr->vm_file, vmr->vm_foff);
>  }
>  
> +void enumerate_vrms(struct proc *p,
       ^enumerate_vmrs

> From 2df52e6bc39bf90a1c90e5382298fa2c2a8316f5 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Thu, 5 Nov 2015 17:30:24 -0800
> Subject: Added API to append data into a circular buffer
> 
> Added API to append data into a circular buffer. The buffer data is
> allocated once at init time, and no more allocations happen after that.

> diff --git a/kern/include/circular_buffer.h b/kern/include/circular_buffer.h
> new file mode 100644
> index 000000000000..4ab8c42e2673
> --- /dev/null
> +++ b/kern/include/circular_buffer.h

Can you put in a little info about how to use this and what the rules are?  For
instance, what happens to writers when the buffer is full (overwrite, block, or
drop?) or to readers when the buffer is empty? (looks like you just get 0).  If
a reader uses an offset, does that just drop a chunk from the buffer?  What is
the cbuf_size_t for?

> --- /dev/null
> +++ b/kern/lib/circular_buffer.c

> +void circular_buffer_clear(struct circular_buffer *cb)
> +{
> +     if (cb->base) {
> +             if (cb->mem)
> +                     kfree(cb->mem);

It's a little surprising that clear also frees the allocation, instead of just
resetting everything back to 0.

> +             cb->rdptr = cb->wrptr = cb->base = cb->mem = NULL;
> +             cb->size = cb->allocated = 0;
> +     }
> +}
> +
> +static bool circular_buffer_is_overlap(const struct circular_buffer *cb,
> +                                                                        
> const char *rptr, const char *wptr,
> +                                                                        
> size_t size)
> +{
> +     return (cb->size > 0) && (rptr >= wptr) && (rptr < (wptr + size));
> +}
> +
> +static void circular_buffer_write_skip(struct circular_buffer *cb, char 
> *wrptr,
> +                                                                        
> size_t size)

What are these helpers doing?

> +size_t circular_buffer_read(struct circular_buffer *cb, char *data, size_t 
> size,
> +                                                     size_t off)
> +{
> +     size_t asize = cb->size, rsize = 0;
> +     const char *rdptr = cb->rdptr;
> +
> +     while (asize > 0 && size > 0) {
> +             size_t bsize = *(const cbuf_size_t *) rdptr;
> +
> +             if (likely(bsize)) {
> +                     size_t esize = bsize - sizeof(cbuf_size_t);
> +
> +                     if (off >= esize) {
> +                             off -= esize;
> +                     } else {
> +                             size_t csize = MIN(esize - off, size);
> +
> +                             memcpy(data, rdptr + sizeof(cbuf_size_t) + off, 
> csize);

So every block of data has the size first, then the data, all packed together?

> From c3c2bb0701f747e5c5c3f3d30cdf033137ae26e2 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Wed, 21 Oct 2015 16:39:04 -0700
> Subject: Implemented the new profiler
> 
> Implemented the new profiler format and added simple userspace
> stack trace (waiting for copy_from_user()).

This is a huge commit that would have been easier to understand if it was broken
up a bit.  For instance, I see there's a user backtrace list, syscall tracing
isn't using a qio queue anymore, the old Plan 9 profiler is gone (great!), and a
bunch of other things.  It also looks like you squeezed in the change that the
start command also turns on the timers for all cores, which isn't something I
wanted.

> --- a/kern/arch/x86/kdebug.c
> +++ b/kern/arch/x86/kdebug.c
> @@ -365,6 +365,25 @@ size_t backtrace_list(uintptr_t pc, uintptr_t fp, 
> uintptr_t *pcs,
>       return nr_pcs;
>  }
>  
> +size_t user_backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
> +                                                size_t nr_slots)
> +{
> +     size_t nr_pcs = 0;
> +
> +     for (;;) {
> +             if (unlikely(fp >= UMAPTOP) || unlikely(fp < BRK_END) ||

Why not is_user_raddr()?  And once you use copy_from, those checks aren't needed
either.  Also, I think it is possible for frame pointers to be below BRK_END,
which could happen if a user stack was malloc'd instead of mmapped.

> +                     unlikely(nr_pcs >= nr_slots))
> +                     break;
> +
> +             /* For now straight memory access, waiting for 
> copy_from_user(). */

copy_from is now available.  =)

> +             pcs[nr_pcs++] = pc;
> +             pc = *(uintptr_t*)(fp + sizeof(uintptr_t));
> +             fp = *(uintptr_t*)fp;
> +     }
> +
> +     return nr_pcs;
> +}
> +
> diff --git a/kern/drivers/dev/kprof.c b/kern/drivers/dev/kprof.c
> index da11b7e44d32..76e53d7f9d30 100644

> +static int oprof_timer_period = 1000;
             ^ rename prof_timer_period

> +static void kprof_start_profiler(void)
> +{
> +     ERRSTACK(2);
>  
> +     sem_down(&kprof.lock);

If you're using a semaphore as a binary mutex (lock), please use a qlock.  This
happens a bunch in this patch set.

> +     if (waserror()) {
> +             sem_up(&kprof.lock);
> +             nexterror();
> +     }
> +     if (!kprof.profiling) {
> +             profiler_init();
> +             if (waserror()) {
> +                     profiler_cleanup();
> +                     nexterror();
> +             }
> +
> +             profiler_control_trace(1);
> +
> +             for (int i = 0; i < num_cores; i++)
> +                     kprof_enable_timer(i, 1);

This looks like starting the profiler turns on the timer.  I wanted to keep the
timer separate from the collection, such that the timer is one of potentially
many sources of data, as discussed in another email.

> +static void kprof_usage_fail(void)
> +{
> +     static const char *ctlstring = "clear|start|stop|timer";
> +     const char* const * cmds = profiler_configure_cmds();
> +     size_t i, size, csize;
> +     char msgbuf[128];
> +
> +     strncpy(msgbuf, ctlstring, sizeof(msgbuf));
> +     size = MIN(strlen(ctlstring), sizeof(msgbuf) - 1);
> +     for (i = 0; cmds[i]; i++) {
> +             csize = strlen(cmds[i]);
> +             if ((csize + size + 2) > sizeof(msgbuf))
> +                     break;
> +             msgbuf[size] = '|';
> +             memcpy(msgbuf + size + 1, cmds[i], csize);
> +             size += csize + 1;
> +     }
> +     msgbuf[size] = 0;
> +
> +     error(EFAIL, msgbuf);
> +}

This is cool, and the pattern tends to come up a lot, where we'd like to
generate usage information.  Perhaps there's a Plan 9 helper for this already?
(I don't know off the top of my head, maybe the Plan 9 guys do).  We already
have stuff that parses command strings.  If we don't have a helper, we can make
one, so that every device doesn't have to redo this logic. 

Another part to it is that a device has several layers of arguments and commands
- it'd be neat if that was clearly written down somewhere and then we could
generate these switches and error messages more easily.  This isn't something
that needs to be done now, but I like your kprof_usage_fail and it brought the
topic up.  =)

> +static struct trace_printk_buffer *kprof_get_printk_buffer(void)
> +{
> +     static struct trace_printk_buffer boot_tpb;
> +     static struct trace_printk_buffer *cpu_tpbs;
> +
> +     if (unlikely(booting))
> +             return &boot_tpb;
> +     if (unlikely(!cpu_tpbs)) {
> +             /* Poor man per-CPU data structure. I really do no like 
> littering global
> +              * data structures with module specific data.
> +              */
> +             spin_lock_irqsave(&ktrace_lock);
> +             if (!cpu_tpbs)
> +                     cpu_tpbs = kzmalloc(num_cores * sizeof(struct 
> trace_printk_buffer),
> +                                                             0);

If this alloc fails, (flags == 0), then we'll return 0 (+coreid) and PF later.

> +             spin_unlock_irqsave(&ktrace_lock);
>       }
> +
> +     return cpu_tpbs + core_id();
>  }


> --- a/kern/include/kdebug.h
> +++ b/kern/include/kdebug.h
> @@ -46,7 +48,7 @@ int printdump(char *buf, int buflen, uint8_t *data);
>  extern bool printx_on;
>  void set_printx(int mode);
>  #define printx(args...) if (printx_on) printk(args)
> -#define trace_printx(args...) if (printx_on) trace_printk(args)
> +#define trace_printx(args...) if (printx_on) trace_printk(TRUE, args)
                                                            ^FALSE?
We just want to print text, not a backtrace.  That was also my intent for
trace_printk (it's just the print, not a backtrace).

> --- a/kern/src/mm.c
> +++ b/kern/src/mm.c
> @@ -24,6 +24,7 @@
>  #include <kmalloc.h>
>  #include <vfs.h>
>  #include <smp.h>
> +#include <profiler.h>
>  
>  struct kmem_cache *vmr_kcache;
>  
> @@ -692,6 +693,9 @@ void *do_mmap(struct proc *p, uintptr_t addr, size_t len, 
> int prot, int flags,
>               }
>       }
>       spin_unlock(&p->vmr_lock);
> +
> +     profiler_notify_mmap(p, addr, len, prot, flags, file, offset);
> +

Do you need to know when something was unmapped?


> diff --git a/kern/src/process.c b/kern/src/process.c
> index bd26789e3ef1..2e4c69518e8a 100644
> --- a/kern/src/process.c
> +++ b/kern/src/process.c
> @@ -331,6 +331,8 @@ error_t proc_alloc(struct proc **pp, struct proc *parent, 
> int flags)
>               kmem_cache_free(proc_cache, p);
>               return -ENOFREEPID;
>       }
> +     if (parent && parent->binary_path)
> +             kstrdup(&p->binary_path, parent->binary_path);

Is this to fix a leak where paths weren't set for forked processes?  If so, this
is an example of a fixup that could have been squashed into a previous commit.

> --- a/kern/src/profiler.c
> +++ b/kern/src/profiler.c

> +static inline char* vb_encode_uint64(char* data, uint64_t n)
> +{
> +     for (; n >= 0x80; n >>= 7)
> +             *data++ = (char) (n | 0x80);
> +     *data++ = (char) n;
> +
> +     return data;
> +}

This probably should be in a library or header (so we can reuse it) and with a
little explanation.  It looks like we take 7 bits of n at a time and push it
into *data, with bit 0x80 signifying we're done or not.


> +static struct block *profiler_buffer_write(struct profiler_cpu_context 
> *cpu_buf,
> +                                                                             
>    struct block *b)
>  {
> -     return (((uint64_t) 0xee01) << 48) | ((uint64_t) cpu << 16) |
> -             (uint64_t) nbt;
> +     if (b) {
> +             qibwrite(profiler_queue, b);
> +
> +             if (qlen(profiler_queue) > profiler_queue_limit) {
> +                     b = qget(profiler_queue);
> +                     if (likely(b)) {
> +                             cpu_buf->dropped_data_size += BLEN(b);
> +                             freeb(b);
> +                     }

This seems like a candidate for a feature added to qio, such as a "drop from the
front" mode.

> +             }
> +     }
> +
> +     return iallocb(profiler_cpu_buffer_size);
>  }

This function seems a little weird.  If you're given a block, you write it.
That makes sense.  But regardless, you alloc a new block?

Seems like we should not be too cute by having write return a fresh block, and
just do the allocation separately.

> +static void profiler_emit_current_system_status(void)
>  {
> -     struct block *b = cpu_buf->block;
> -    size_t totalsize = sizeof(struct op_sample) +
> -             size * sizeof(entry->sample->data[0]);
> -
> -     if (unlikely((!b) || (b->lim - b->wp) < totalsize)) {
> -             if (b)
> -                     qibwrite(profiler_queue, b);
> -             /* For now. Later, we will grab a block off the
> -              * emptyblock queue.
> -              */
> -             cpu_buf->block = b = iallocb(profiler_cpu_buffer_size);
> -        if (unlikely(!b)) {
> -                     printk("%s: fail\n", __func__);
> -                     return NULL;
> +     void enum_proc(struct vm_region *vmr, void *opaque)
> +     {
> +             struct proc *p = (struct proc *) opaque;
> +
> +             profiler_notify_mmap(p, vmr->vm_base, vmr->vm_end - 
> vmr->vm_base,
> +                                                      vmr->vm_prot, 
> vmr->vm_flags, vmr->vm_file,
> +                                                      vmr->vm_foff);
> +     }
> +
> +     ERRSTACK(2);

I think you only need 1 here.  It's based on the number of waserrors.

> +     struct process_set pset;
> +
> +     proc_get_set(&pset);
> +     if (waserror()) {
> +             proc_free_set(&pset);
> +             nexterror();
> +     }
> +
> +     for (size_t i = 0; i < pset.num_processes; i++)
> +             enumerate_vrms(pset.procs[i], enum_proc, pset.procs[i]);
                 ^vmrs

> +     poperror();
> +     proc_free_set(&pset);
> +}
> +
> +static inline int profiler_is_tracing(struct profiler_cpu_context *cpu_buf)

Since this is named "is_tracing", it should be a bool, and return TRUE/False to
avoid any confusion.


> -static inline int profiler_add_sample(struct profiler_cpu_context *cpu_buf,
> -                                                                       
> uintptr_t pc, unsigned long event)
> +static void alloc_cpu_buffers(void)
>  {
>       ERRSTACK(1);
> -     struct op_entry entry;
> -     struct block *b;
> +     int i;
>  
> +     profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> +     if (!profiler_queue)
> +             error(-ENOMEM, NULL);
              ^just ENOMEM

>       if (waserror()) {
> -             poperror();
> -             printk("%s: failed\n", __func__);
> -             return 1;
> +             free_cpu_buffers();
> +             nexterror();
>       }
>  
> -     b = profiler_cpu_buffer_write_reserve(cpu_buf, &entry, 0);
> -     if (likely(b)) {
> -             entry.sample->hdr = profiler_create_header(core_id(), 1);
> -             entry.sample->event = (uint64_t) event;
> -             profiler_cpu_buffer_add_data(&entry, &pc, 1);
> -     }
> -     poperror();
> +     qdropoverflow(profiler_queue, 1);
> +     qnonblock(profiler_queue, 1);

As mentioned in a previous patch, you might not want qnonblock.

Incidentally, qnonblock and qdropoverflow take a bool, not an int.  So TRUE
instead of 1.

> +     profiler_percpu_ctx =
> +             kzmalloc(sizeof(*profiler_percpu_ctx) * num_cores, 
> KMALLOC_WAIT);
> +     if (!profiler_percpu_ctx)
> +             error(-ENOMEM, NULL);

Since you did a KMALLOC_WAIT, you will never have !profiler_percpu_ctx

> -static inline void profiler_begin_trace(struct profiler_cpu_context *cpu_buf)
> +int profiler_configure(struct cmdbuf *cb)
>  {
> -     cpu_buf->tracing = 1;
> +     if (!strcmp(cb->f[0], "prof_qlimit")) {
> +             if (cb->nf < 2)
> +                     error(EFAIL, "prof_qlimit KB");
> +             profiler_queue_limit = atoi(cb->f[1]) * 1024;

Probably want a sanity check.  Also, what happens when this changes after the
queue was already allocated?

> +     } else if (!strcmp(cb->f[0], "prof_cpubufsz")) {
> +             if (cb->nf < 2)
> +                     error(EFAIL, "prof_cpubufsz KB");
> +             profiler_cpu_buffer_size = atoi(cb->f[1]) * 1024;

Is there any danger with the user setting this to be a very small value (like
0)?   It looks like the assumption in profiler_cpu_buffer_write_reserve() is
that a fresh allocation (done by profiler_buffer_write()) is enough for size
bytes.

> +     } else if (!strcmp(cb->f[0], "prof_btdepth")) {
> +             if (cb->nf < 2)
> +                     error(EFAIL, "prof_btdepth DEPTH");
> +             profiler_backtrace_depth = atoi(cb->f[1]);

It is dangerous to have the user control this.  It's a stack allocation.  Even
if it wasn't, we'd need a sanity check of some sort.

> +void profiler_cleanup(void)
> +{
> +     sem_down(&profiler_mtx);
> +     profiler_users--;
> +     if (profiler_users == 0)
> +             free_cpu_buffers();
> +     sem_up(&profiler_mtx);
>  }

I'm still concerned about this.  If the only source of profiling data is from
the timer IRQs, then your current stuff is seems fine.  (You disable the timers,
and thus the handlers) before freeing this stuff).  But if we ever have any
other use of this, then we'll need to be careful.  

But there's more to it than that.  The assumption here is that once
profiler_users == 0, then there is no code, (RKM, pending IRQ, etc.) that has
access to profiler_percpu_ctx or profiler_queue.

Just as an example, say someone is calling profiler_notify_mmap(), which checks
profiler_percpu_ctx.  They get past the if () check, then concurrently someone
calls profiler_cleanup and triggers free_cpu_buffers.  Then the original thread
eventually calls profiler_push_pid_mmap, then qwrite, then page faults.

So a user could theoretically trigger a PF.  This is part of the reason why I
was reluctant to have you try and free the buffers.

> @@ -572,6 +571,7 @@ static int sys_proc_create(struct proc *p, char *path, 
> size_t path_l,
>       user_memdup_free(p, kargenv);
>       __proc_ready(new_p);
>       pid = new_p->pid;
> +     profiler_notify_new_process(new_p);
>       proc_decref(new_p);     /* give up the reference created in 
> proc_create() */
>       return pid;
>  error_load_elf:
> @@ -728,6 +728,7 @@ static ssize_t sys_fork(env_t* e)
>  
>       printd("[PID %d] fork PID %d\n", e->pid, env->pid);
>       ret = env->pid;
> +     profiler_notify_new_process(env);
>       proc_decref(env);       /* give up the reference created in 
> proc_alloc() */
>       return ret;
>  }

Do you need to update things when a process changes its binary path?  (exec)

> From c0ab4ec0d729ad8e0555b3beef72340f90c23712 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Sat, 7 Nov 2015 18:47:16 -0800
> Subject: Enabled /prof/kptrace collection of anything which goes into cprintf
> 
> Enabled /prof/kptrace collection of anything which goes into cprintf,
> printk, and its associates.
> The kptrace collector is a circular buffer whose default size is 128KB.

What happens when the ring is full?  New stuff drops?  My main concern here is
that trace_vprintk could somehow cause an issue and hold up the real printk,
which would be hard to debug.

> --- a/kern/drivers/dev/kprof.c
> +++ b/kern/drivers/dev/kprof.c
> @@ -19,6 +19,7 @@
>  #include <error.h>
>  #include <pmap.h>
>  #include <smp.h>
> +#include <time.h>
>  #include <circular_buffer.h>
>  #include <umem.h>
>  #include <profiler.h>
> @@ -62,7 +63,7 @@ struct dirtab kproftab[] = {
>       {"mpstat-raw",  {Kmpstatrawqid},        0,      0600},
>  };
>  
> -extern int booting;
> +extern system_timing_t system_timing;

Should be able to get system_timing from a header.

>  static struct kprof kprof;
>  static bool ktrace_init_done = FALSE;
>  static spinlock_t ktrace_lock = SPINLOCK_INITIALIZER_IRQSAVE;
> @@ -567,7 +568,7 @@ static struct trace_printk_buffer 
> *kprof_get_printk_buffer(void)
>       static struct trace_printk_buffer boot_tpb;
>       static struct trace_printk_buffer *cpu_tpbs;
>  
> -     if (unlikely(booting))
> +     if (unlikely(!num_cores))
>               return &boot_tpb;

That seems a little odd.  I'd think if we're still booting, we'd use the
boot_tpb.  Was there some corner case that triggered this?  I understand this
one:

> -     if (likely(!booting))
> +     if (likely(system_timing.tsc_freq))
>               tsc2timespec(read_tsc(), &ts_now);

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to