Branch pushed.
On Thu, Nov 12, 2015 at 2:00 PM, Davide Libenzi <[email protected]> wrote: > On Thu, Nov 12, 2015 at 12:02 PM, Barret Rhoden <[email protected]> > wrote: > >> > diff --git a/kern/include/profiler.h b/kern/include/profiler.h >> > new file mode 100644 >> > index 000000000000..2ec9d4308632 >> > --- /dev/null >> > +++ b/kern/include/profiler.h >> >> New, non-trivial files need a copyright header. >> > > Already has, if you reviewed the tip of the branch. > > > >> >> > @@ -0,0 +1,18 @@ >> > + >> > +#ifndef ROS_KERN_INC_PROFILER_H >> > +#define ROS_KERN_INC_PROFILER_H >> >> Can also do the pragma once now. >> > > Already has. > > > >> >> > diff --git a/kern/src/profiler.c b/kern/src/profiler.c >> > new file mode 100644 >> > index 000000000000..ca7197bbc20b >> > --- /dev/null >> > +++ b/kern/src/profiler.c >> >> Needs a copyright. It looks like this might be new code, but if there's >> old stuff from oprofile in it, we need to put that. I see there are a >> couple things like "op_entry" still around. >> > > Already has. > > > > >> >> > +static inline int profiler_add_sample(struct profiler_cpu_context >> *cpu_buf, >> > + >> uintptr_t pc, unsigned long event) >> >> Do you want this to return a bool (TRUE for success) or an int (0 for >> success)? See below: >> > > This function is gone on tip. > Since this was v2, and v1 was never reviewed, you likely commented the v1 > version. > > > >> > From 0a919619324782e873ddf5bbf9bd19e989f25162 Mon Sep 17 00:00:00 2001 >> > From: Davide Libenzi <[email protected]> >> > Date: Tue, 20 Oct 2015 16:34:26 -0700 >> > Subject: Do not race when multiple init happen at the same time >> >> > @@ -65,53 +64,63 @@ static inline size_t >> profiler_cpu_buffer_add_data(struct op_entry *entry, >> > static void free_cpu_buffers(void) >> > { >> > kfree(profiler_percpu_ctx); >> > + profiler_percpu_ctx = NULL; >> > + >> > + qclose(profiler_queue); >> > + profiler_queue = NULL; >> > } >> >> Isn't this stuff possibly used from IRQ context? What's the plan for >> cleaning up such that we are sure that there are no interrupts going off >> while you are freeing things? >> > > That function is only called when there are no more users. > Timers are stopped before calling profiler_cleanup(). > Not sure how sync is removal of a timer. > > > >> >> In: >> > static int alloc_cpu_buffers(void) >> > { >> > - if (!profiler_queue) { >> > - profiler_queue = qopen(profiler_queue_limit, 0, NULL, >> NULL); >> > - if (!profiler_queue) >> > - goto fail; >> > - } >> > + int i; >> > + >> > + profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL); >> > + if (!profiler_queue) >> > + return -ENOMEM; >> > >> > - /* we *really* don't want to block. Losing data is better. */ >> > qdropoverflow(profiler_queue, 1); >> >> Then later: >> >> > + qnonblock(profiler_queue, 1); >> >> This is fragile. qnonblock means "error with EAGAIN if we're full". >> Right now, you'll get by, since you still have qdropoverflow above. But >> this could break with an error() thrown from IRQ context somewhere down >> the line. >> > > Yes, there are paired. I am not sure which kind of bad condition can > happen. > > > >> >> > void profiler_cleanup(void) >> > { >> > - free_cpu_buffers(); >> > + sem_down(&mtx); >> > + profiler_users--; >> > + if (profiler_users == 0) >> > + free_cpu_buffers(); >> > + sem_up(&mtx); >> > } >> >> This might be a candidate for a kref, depending on how cleanup ends up >> working. No need to change it now though. >> > > kref don't block until the 0->1 transitioner is initializing, no? > > > >> > From fd096815dc200a550ea5e6b6d7f133df75e29ed9 Mon Sep 17 00:00:00 2001 >> > From: Davide Libenzi <[email protected]> >> > Date: Thu, 22 Oct 2015 14:30:26 -0700 >> > Subject: Added full binary path into the proc structure >> > >> >> > @@ -791,6 +763,7 @@ static int sys_exec(struct proc *p, char *path, >> size_t path_l, >> > t_path = copy_in_path(p, path, path_l); >> > if (!t_path) >> > return -1; >> > + proc_replace_binary_path(p, t_path); >> >> If we error out after this point, we'll have changed the path of the >> binary to >> the new one even though the exec failed. I'm okay with that, just more >> of an >> FYI. >> > > I am fixing this. > > > > >> >> > From 47bf34b1bc96b78a8091726cb3016b1f0d964847 Mon Sep 17 00:00:00 2001 >> > From: Davide Libenzi <[email protected]> >> > Date: Thu, 22 Oct 2015 10:56:37 -0700 >> > Subject: Added APIs to access process startup information >> > >> > +char *get_startup_argv(struct proc *p, size_t idx, char *argp, >> > + size_t max_size) >> > +{ >> > + size_t stack_space = (const char *) USTACKTOP - (const char *) >> p->args_base; >> > + const char *sptr = (const char *) p->args_base + sizeof(size_t) + >> > + idx * sizeof(char *); >> > + const char *argv = NULL; >> > + >> > + /* TODO,DL: Use copy_from_user() when available. >> > + */ >> >> You've got these now, though memcpy_from_user will do it. (or will soon). >> Either way, you don't need the TODO. >> > > The strcpy is needed. Currently it copies up to max size, which is kind of > a hack. > But I will not be fixing it here (in this CR, that is). > > > > >> >> > + if (memcpy_from_user(p, &argv, sptr, sizeof(char *))) >> > + return NULL; >> > + >> > + /* TODO,DL: Use strncpy_from_user() when available. >> > + */ >> > + max_size = MIN(max_size, stack_space); >> > + if (memcpy_from_user(p, argp, argv, max_size)) >> > + return NULL; >> > + argp[max_size - 1] = 0; >> > + >> > + return argp; >> > +} >> >> How important is it that the command line data hasn't changed? We're >> trusting >> the user to not have changed it since they were invoked. > > >> Also, I guess we're going to want to parse the argv stuff later. Kevin >> put >> together some code to pack and parse the args earlier. It might be >> easier and >> to keep that blob around and parse it instead of this. Not a huge deal >> though. >> > > We could keep the kernel buffer copy, but I'd prefer to eventually do that > in a separate CL. > > > > >> > From 1aede03303debfe3075aa7055c6f1dd843f806bc Mon Sep 17 00:00:00 2001 >> > From: Davide Libenzi <[email protected]> >> > Date: Tue, 27 Oct 2015 15:29:10 -0700 >> > Subject: Added API to enumerate the VM regions of a process >> >> > --- a/kern/include/mm.h >> > +++ b/kern/include/mm.h >> > @@ -51,6 +51,9 @@ void isolate_vmrs(struct proc *p, uintptr_t va, >> size_t len); >> > void unmap_and_destroy_vmrs(struct proc *p); >> > int duplicate_vmrs(struct proc *p, struct proc *new_p); >> > void print_vmrs(struct proc *p); >> > +void enumerate_vrms(struct proc *p, >> ^enumerate_vmrs >> > > Fixed. Took me a little to understand what was wrong ☺ > > > > >> > diff --git a/kern/src/mm.c b/kern/src/mm.c >> > index ca91e63620b6..06a349840404 100644 >> > --- a/kern/src/mm.c >> > +++ b/kern/src/mm.c >> > @@ -359,6 +359,18 @@ void print_vmrs(struct proc *p) >> > vmr->vm_file, vmr->vm_foff); >> > } >> > >> > +void enumerate_vrms(struct proc *p, >> ^enumerate_vmrs >> > > Ditto. > > > >> >> > From 2df52e6bc39bf90a1c90e5382298fa2c2a8316f5 Mon Sep 17 00:00:00 2001 >> > From: Davide Libenzi <[email protected]> >> > Date: Thu, 5 Nov 2015 17:30:24 -0800 >> > Subject: Added API to append data into a circular buffer >> > >> > Added API to append data into a circular buffer. The buffer data is >> > allocated once at init time, and no more allocations happen after that. >> >> > diff --git a/kern/include/circular_buffer.h >> b/kern/include/circular_buffer.h >> > new file mode 100644 >> > index 000000000000..4ab8c42e2673 >> > --- /dev/null >> > +++ b/kern/include/circular_buffer.h >> >> Can you put in a little info about how to use this and what the rules >> are? For >> instance, what happens to writers when the buffer is full (overwrite, >> block, or >> drop?) or to readers when the buffer is empty? (looks like you just get >> 0). If >> a reader uses an offset, does that just drop a chunk from the buffer? >> What is >> the cbuf_size_t for? >> > > Locking is up to the reader. > In a circular buffer fashion, a write on full buffer discard oldest > buffer, and write new one. > No, reading data does not remove data. The only way the read ptr is moved > forward, is when the write ptr pushes it forward in order to fit new data. > Every block in the buffer, is pre-pended with its size. That is the type > of the size. > > > > >> >> > --- /dev/null >> > +++ b/kern/lib/circular_buffer.c >> >> > +void circular_buffer_clear(struct circular_buffer *cb) >> > +{ >> > + if (cb->base) { >> > + if (cb->mem) >> > + kfree(cb->mem); >> >> It's a little surprising that clear also frees the allocation, instead of >> just >> resetting everything back to 0. >> > > This clears all the resources of the buffer. > I used "clear" as there is an "init". > I would have used "free" if there was an "alloc". > > > > >> >> > + cb->rdptr = cb->wrptr = cb->base = cb->mem = NULL; >> > + cb->size = cb->allocated = 0; >> > + } >> > +} >> > + >> > +static bool circular_buffer_is_overlap(const struct circular_buffer >> *cb, >> > + >> const char *rptr, const char *wptr, >> > + >> size_t size) >> > +{ >> > + return (cb->size > 0) && (rptr >= wptr) && (rptr < (wptr + size)); >> > +} >> > + >> > +static void circular_buffer_write_skip(struct circular_buffer *cb, >> char *wrptr, >> > + >> size_t size) >> >> What are these helpers doing? >> > > The overlap check is a write operation is going to overwrite the block at > which rdptr resides. > The skip, moves forward the rdptr until it no more overlaps with the > incoming write. > > > > >> >> > +size_t circular_buffer_read(struct circular_buffer *cb, char *data, >> size_t size, >> > + size_t off) >> > +{ >> > + size_t asize = cb->size, rsize = 0; >> > + const char *rdptr = cb->rdptr; >> > + >> > + while (asize > 0 && size > 0) { >> > + size_t bsize = *(const cbuf_size_t *) rdptr; >> > + >> > + if (likely(bsize)) { >> > + size_t esize = bsize - sizeof(cbuf_size_t); >> > + >> > + if (off >= esize) { >> > + off -= esize; >> > + } else { >> > + size_t csize = MIN(esize - off, size); >> > + >> > + memcpy(data, rdptr + sizeof(cbuf_size_t) >> + off, csize); >> >> So every block of data has the size first, then the data, all packed >> together? >> > > Yup. > > > > >> >> > From c3c2bb0701f747e5c5c3f3d30cdf033137ae26e2 Mon Sep 17 00:00:00 2001 >> > From: Davide Libenzi <[email protected]> >> > Date: Wed, 21 Oct 2015 16:39:04 -0700 >> > Subject: Implemented the new profiler >> > >> > Implemented the new profiler format and added simple userspace >> > stack trace (waiting for copy_from_user()). >> >> This is a huge commit that would have been easier to understand if it was >> broken >> up a bit. For instance, I see there's a user backtrace list, syscall >> tracing >> isn't using a qio queue anymore, the old Plan 9 profiler is gone >> (great!), and a >> bunch of other things. It also looks like you squeezed in the change >> that the >> start command also turns on the timers for all cores, which isn't >> something I >> wanted. >> > > Yes, it is a big change 😑 > Braking it down would be a PITA at this point, since accumulated changes > in the last 4 weeks that has been sitting there. > As for the timers, for Ron's case, you can still turn them on aside from > the profiler. > For a profiler POV, I think the default behavior should be them on > everywhere, otherwise you lose the full picture of what the system is doing. > We could add a selective CPU timer and profiler turn on, but IMO the > default should be all on. > > > > >> >> > --- a/kern/arch/x86/kdebug.c >> > +++ b/kern/arch/x86/kdebug.c >> > @@ -365,6 +365,25 @@ size_t backtrace_list(uintptr_t pc, uintptr_t fp, >> uintptr_t *pcs, >> > return nr_pcs; >> > } >> > >> > +size_t user_backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs, >> > + size_t nr_slots) >> > +{ >> > + size_t nr_pcs = 0; >> > + >> > + for (;;) { >> > + if (unlikely(fp >= UMAPTOP) || unlikely(fp < BRK_END) || >> >> Why not is_user_raddr()? And once you use copy_from, those checks aren't >> needed >> either. Also, I think it is possible for frame pointers to be below >> BRK_END, >> which could happen if a user stack was malloc'd instead of mmapped. >> >> > + unlikely(nr_pcs >= nr_slots)) >> > + break; >> > + >> > + /* For now straight memory access, waiting for >> copy_from_user(). */ >> >> copy_from is now available. =) >> > > I will do that is a separate CL. > > > > > diff --git a/kern/drivers/dev/kprof.c b/kern/drivers/dev/kprof.c >> > index da11b7e44d32..76e53d7f9d30 100644 >> >> > +static int oprof_timer_period = 1000; >> ^ rename prof_timer_period >> > > Done. > > > > >> >> > +static void kprof_start_profiler(void) >> > +{ >> > + ERRSTACK(2); >> > >> > + sem_down(&kprof.lock); >> >> If you're using a semaphore as a binary mutex (lock), please use a >> qlock. This >> happens a bunch in this patch set. >> > > Done. > > > >> >> > + if (waserror()) { >> > + sem_up(&kprof.lock); >> > + nexterror(); >> > + } >> > + if (!kprof.profiling) { >> > + profiler_init(); >> > + if (waserror()) { >> > + profiler_cleanup(); >> > + nexterror(); >> > + } >> > + >> > + profiler_control_trace(1); >> > + >> > + for (int i = 0; i < num_cores; i++) >> > + kprof_enable_timer(i, 1); >> >> This looks like starting the profiler turns on the timer. I wanted to >> keep the >> timer separate from the collection, such that the timer is one of >> potentially >> many sources of data, as discussed in another email. >> > > For future uses, yes. But today, with no timers, there is no data. > kptrace is a different data path, which is always on. > When we do, in the code I am working now, counter overflow based interrupt > triggers, then we need to rework this file. > > > > > >> >> > +static void kprof_usage_fail(void) >> > +{ >> > + static const char *ctlstring = "clear|start|stop|timer"; >> > + const char* const * cmds = profiler_configure_cmds(); >> > + size_t i, size, csize; >> > + char msgbuf[128]; >> > + >> > + strncpy(msgbuf, ctlstring, sizeof(msgbuf)); >> > + size = MIN(strlen(ctlstring), sizeof(msgbuf) - 1); >> > + for (i = 0; cmds[i]; i++) { >> > + csize = strlen(cmds[i]); >> > + if ((csize + size + 2) > sizeof(msgbuf)) >> > + break; >> > + msgbuf[size] = '|'; >> > + memcpy(msgbuf + size + 1, cmds[i], csize); >> > + size += csize + 1; >> > + } >> > + msgbuf[size] = 0; >> > + >> > + error(EFAIL, msgbuf); >> > +} >> >> This is cool, and the pattern tends to come up a lot, where we'd like to >> generate usage information. Perhaps there's a Plan 9 helper for this >> already? >> (I don't know off the top of my head, maybe the Plan 9 guys do). We >> already >> have stuff that parses command strings. If we don't have a helper, we >> can make >> one, so that every device doesn't have to redo this logic. >> >> Another part to it is that a device has several layers of arguments and >> commands >> - it'd be neat if that was clearly written down somewhere and then we >> could >> generate these switches and error messages more easily. This isn't >> something >> that needs to be done now, but I like your kprof_usage_fail and it >> brought the >> topic up. =) >> > > OK ☺ > > > > >> >> > +static struct trace_printk_buffer *kprof_get_printk_buffer(void) >> > +{ >> > + static struct trace_printk_buffer boot_tpb; >> > + static struct trace_printk_buffer *cpu_tpbs; >> > + >> > + if (unlikely(booting)) >> > + return &boot_tpb; >> > + if (unlikely(!cpu_tpbs)) { >> > + /* Poor man per-CPU data structure. I really do no like >> littering global >> > + * data structures with module specific data. >> > + */ >> > + spin_lock_irqsave(&ktrace_lock); >> > + if (!cpu_tpbs) >> > + cpu_tpbs = kzmalloc(num_cores * sizeof(struct >> trace_printk_buffer), >> > + 0); >> >> If this alloc fails, (flags == 0), then we'll return 0 (+coreid) and PF >> later. >> > > We could panic, or return boot tpb. > Today a OOM crashes the box, so I am not sure what to do here. > > > > > --- a/kern/include/kdebug.h >> > +++ b/kern/include/kdebug.h >> > @@ -46,7 +48,7 @@ int printdump(char *buf, int buflen, uint8_t *data); >> > extern bool printx_on; >> > void set_printx(int mode); >> > #define printx(args...) if (printx_on) printk(args) >> > -#define trace_printx(args...) if (printx_on) trace_printk(args) >> > +#define trace_printx(args...) if (printx_on) trace_printk(TRUE, args) >> ^FALSE? >> We just want to print text, not a backtrace. That was also my intent for >> trace_printk (it's just the print, not a backtrace). >> > > Seems pretty useful to have the option to get a BT, to understand from > which path you are coming from. > A "grep" of the stirng simply tells where the print is. > > > >> >> > --- a/kern/src/mm.c >> > +++ b/kern/src/mm.c >> > @@ -24,6 +24,7 @@ >> > #include <kmalloc.h> >> > #include <vfs.h> >> > #include <smp.h> >> > +#include <profiler.h> >> > >> > struct kmem_cache *vmr_kcache; >> > >> > @@ -692,6 +693,9 @@ void *do_mmap(struct proc *p, uintptr_t addr, >> size_t len, int prot, int flags, >> > } >> > } >> > spin_unlock(&p->vmr_lock); >> > + >> > + profiler_notify_mmap(p, addr, len, prot, flags, file, offset); >> > + >> >> Do you need to know when something was unmapped? >> > > Linux perf don't seem to care, as it does not have any command to report > that. > > > > diff --git a/kern/src/process.c b/kern/src/process.c >> > index bd26789e3ef1..2e4c69518e8a 100644 >> > --- a/kern/src/process.c >> > +++ b/kern/src/process.c >> > @@ -331,6 +331,8 @@ error_t proc_alloc(struct proc **pp, struct proc >> *parent, int flags) >> > kmem_cache_free(proc_cache, p); >> > return -ENOFREEPID; >> > } >> > + if (parent && parent->binary_path) >> > + kstrdup(&p->binary_path, parent->binary_path); >> >> Is this to fix a leak where paths weren't set for forked processes? If >> so, this >> is an example of a fixup that could have been squashed into a previous >> commit. >> > > I lost track for what it was 😀 > But yes, could have been possibly squashed. > > > > >> >> > --- a/kern/src/profiler.c >> > +++ b/kern/src/profiler.c >> >> > +static inline char* vb_encode_uint64(char* data, uint64_t n) >> > +{ >> > + for (; n >= 0x80; n >>= 7) >> > + *data++ = (char) (n | 0x80); >> > + *data++ = (char) n; >> > + >> > + return data; >> > +} >> >> This probably should be in a library or header (so we can reuse it) and >> with a >> little explanation. It looks like we take 7 bits of n at a time and push >> it >> into *data, with bit 0x80 signifying we're done or not. >> > > Yep, that's what t does. > I wasn't sure if variable byte encoding would be something a kernel would > be doing elsewhere. > > > > > +static struct block *profiler_buffer_write(struct profiler_cpu_context >> *cpu_buf, >> > + >> struct block *b) >> > { >> > - return (((uint64_t) 0xee01) << 48) | ((uint64_t) cpu << 16) | >> > - (uint64_t) nbt; >> > + if (b) { >> > + qibwrite(profiler_queue, b); >> > + >> > + if (qlen(profiler_queue) > profiler_queue_limit) { >> > + b = qget(profiler_queue); >> > + if (likely(b)) { >> > + cpu_buf->dropped_data_size += BLEN(b); >> > + freeb(b); >> > + } >> >> This seems like a candidate for a feature added to qio, such as a "drop >> from the >> front" mode. >> > > OK, but can we please do that in a separate CL? This branch has been > sitting for some time already. > > > > >> >> > + } >> > + } >> > + >> > + return iallocb(profiler_cpu_buffer_size); >> > } >> >> This function seems a little weird. If you're given a block, you write >> it. >> That makes sense. But regardless, you alloc a new block? >> >> Seems like we should not be too cute by having write return a fresh >> block, and >> just do the allocation separately. >> > > I drop a block from the tail. I *could* be reusing it, but I have found no > reinit API to re initialized a block. > > > > >> >> > +static void profiler_emit_current_system_status(void) >> > { >> > - struct block *b = cpu_buf->block; >> > - size_t totalsize = sizeof(struct op_sample) + >> > - size * sizeof(entry->sample->data[0]); >> > - >> > - if (unlikely((!b) || (b->lim - b->wp) < totalsize)) { >> > - if (b) >> > - qibwrite(profiler_queue, b); >> > - /* For now. Later, we will grab a block off the >> > - * emptyblock queue. >> > - */ >> > - cpu_buf->block = b = iallocb(profiler_cpu_buffer_size); >> > - if (unlikely(!b)) { >> > - printk("%s: fail\n", __func__); >> > - return NULL; >> > + void enum_proc(struct vm_region *vmr, void *opaque) >> > + { >> > + struct proc *p = (struct proc *) opaque; >> > + >> > + profiler_notify_mmap(p, vmr->vm_base, vmr->vm_end - >> vmr->vm_base, >> > + vmr->vm_prot, >> vmr->vm_flags, vmr->vm_file, >> > + vmr->vm_foff); >> > + } >> > + >> > + ERRSTACK(2); >> >> I think you only need 1 here. It's based on the number of waserrors. >> > > Fixed. > > > > >> >> > + struct process_set pset; >> > + >> > + proc_get_set(&pset); >> > + if (waserror()) { >> > + proc_free_set(&pset); >> > + nexterror(); >> > + } >> > + >> > + for (size_t i = 0; i < pset.num_processes; i++) >> > + enumerate_vrms(pset.procs[i], enum_proc, pset.procs[i]); >> ^vmrs >> > > Done. > > > >> >> > + poperror(); >> > + proc_free_set(&pset); >> > +} >> > + >> > +static inline int profiler_is_tracing(struct profiler_cpu_context >> *cpu_buf) >> >> Since this is named "is_tracing", it should be a bool, and return >> TRUE/False to >> avoid any confusion. >> > > Done. > > > > > -static inline int profiler_add_sample(struct profiler_cpu_context >> *cpu_buf, >> > - >> uintptr_t pc, unsigned long event) >> > +static void alloc_cpu_buffers(void) >> > { >> > ERRSTACK(1); >> > - struct op_entry entry; >> > - struct block *b; >> > + int i; >> > >> > + profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL); >> > + if (!profiler_queue) >> > + error(-ENOMEM, NULL); >> ^just ENOMEM >> > > Done. > > > > + qdropoverflow(profiler_queue, 1); >> > + qnonblock(profiler_queue, 1); >> >> As mentioned in a previous patch, you might not want qnonblock. >> > > Not sure why not? > > > > Incidentally, qnonblock and qdropoverflow take a bool, not an int. So TRUE >> instead of 1. >> > > Done. > > > >> >> > + profiler_percpu_ctx = >> > + kzmalloc(sizeof(*profiler_percpu_ctx) * num_cores, >> KMALLOC_WAIT); >> > + if (!profiler_percpu_ctx) >> > + error(-ENOMEM, NULL); >> >> Since you did a KMALLOC_WAIT, you will never have !profiler_percpu_ctx >> > > I am not sure waiting is a guarantee that you will get memory anyway. > > > > -static inline void profiler_begin_trace(struct profiler_cpu_context >> *cpu_buf) >> > +int profiler_configure(struct cmdbuf *cb) >> > { >> > - cpu_buf->tracing = 1; >> > + if (!strcmp(cb->f[0], "prof_qlimit")) { >> > + if (cb->nf < 2) >> > + error(EFAIL, "prof_qlimit KB"); >> > + profiler_queue_limit = atoi(cb->f[1]) * 1024; >> >> Probably want a sanity check. Also, what happens when this changes after >> the >> queue was already allocated? >> > > The configuration must be done before the profiler is started. > > > > >> >> > + } else if (!strcmp(cb->f[0], "prof_cpubufsz")) { >> > + if (cb->nf < 2) >> > + error(EFAIL, "prof_cpubufsz KB"); >> > + profiler_cpu_buffer_size = atoi(cb->f[1]) * 1024; >> >> Is there any danger with the user setting this to be a very small value >> (like >> 0)? It looks like the assumption in profiler_cpu_buffer_write_reserve() >> is >> that a fresh allocation (done by profiler_buffer_write()) is enough for >> size >> bytes. >> >> > + } else if (!strcmp(cb->f[0], "prof_btdepth")) { >> > + if (cb->nf < 2) >> > + error(EFAIL, "prof_btdepth DEPTH"); >> > + profiler_backtrace_depth = atoi(cb->f[1]); >> >> It is dangerous to have the user control this. It's a stack allocation. >> Even >> if it wasn't, we'd need a sanity check of some sort. >> > > I will drop the BT depth config, and limit the others. > > > >> >> > +void profiler_cleanup(void) >> > +{ >> > + sem_down(&profiler_mtx); >> > + profiler_users--; >> > + if (profiler_users == 0) >> > + free_cpu_buffers(); >> > + sem_up(&profiler_mtx); >> > } >> >> I'm still concerned about this. If the only source of profiling data is >> from >> the timer IRQs, then your current stuff is seems fine. (You disable the >> timers, >> and thus the handlers) before freeing this stuff). But if we ever have >> any >> other use of this, then we'll need to be careful. >> > > As I said, when I come in with the overflow-triggered sampling, this will > need some retuning. > > > >> >> But there's more to it than that. The assumption here is that once >> profiler_users == 0, then there is no code, (RKM, pending IRQ, etc.) that >> has >> access to profiler_percpu_ctx or profiler_queue. >> >> Just as an example, say someone is calling profiler_notify_mmap(), which >> checks >> profiler_percpu_ctx. They get past the if () check, then concurrently >> someone >> calls profiler_cleanup and triggers free_cpu_buffers. Then the original >> thread >> eventually calls profiler_push_pid_mmap, then qwrite, then page faults. >> >> So a user could theoretically trigger a PF. This is part of the reason >> why I >> was reluctant to have you try and free the buffers. >> > > Well, we have no RCU ☺ > In such case we should swap ptr with NULL, and RCU-free it. > > > > > > @@ -572,6 +571,7 @@ static int sys_proc_create(struct proc *p, char >> *path, size_t path_l, >> > user_memdup_free(p, kargenv); >> > __proc_ready(new_p); >> > pid = new_p->pid; >> > + profiler_notify_new_process(new_p); >> > proc_decref(new_p); /* give up the reference created in >> proc_create() */ >> > return pid; >> > error_load_elf: >> > @@ -728,6 +728,7 @@ static ssize_t sys_fork(env_t* e) >> > >> > printd("[PID %d] fork PID %d\n", e->pid, env->pid); >> > ret = env->pid; >> > + profiler_notify_new_process(env); >> > proc_decref(env); /* give up the reference created in >> proc_alloc() */ >> > return ret; >> > } >> >> Do you need to update things when a process changes its binary path? >> (exec) >> > > That should lead to new mmaps. > > > > > From c0ab4ec0d729ad8e0555b3beef72340f90c23712 Mon Sep 17 00:00:00 2001 >> > From: Davide Libenzi <[email protected]> >> > Date: Sat, 7 Nov 2015 18:47:16 -0800 >> > Subject: Enabled /prof/kptrace collection of anything which goes into >> cprintf >> > >> > Enabled /prof/kptrace collection of anything which goes into cprintf, >> > printk, and its associates. >> > The kptrace collector is a circular buffer whose default size is 128KB. >> >> What happens when the ring is full? New stuff drops? My main concern >> here is >> that trace_vprintk could somehow cause an issue and hold up the real >> printk, >> which would be hard to debug. >> > > Ring full drops old stuff. Never blocks. > > > > >> >> > --- a/kern/drivers/dev/kprof.c >> > +++ b/kern/drivers/dev/kprof.c >> > @@ -19,6 +19,7 @@ >> > #include <error.h> >> > #include <pmap.h> >> > #include <smp.h> >> > +#include <time.h> >> > #include <circular_buffer.h> >> > #include <umem.h> >> > #include <profiler.h> >> > @@ -62,7 +63,7 @@ struct dirtab kproftab[] = { >> > {"mpstat-raw", {Kmpstatrawqid}, 0, 0600}, >> > }; >> > >> > -extern int booting; >> > +extern system_timing_t system_timing; >> >> Should be able to get system_timing from a header. >> > > Yup, arch/time.h, done. > > > > >> >> > static struct kprof kprof; >> > static bool ktrace_init_done = FALSE; >> > static spinlock_t ktrace_lock = SPINLOCK_INITIALIZER_IRQSAVE; >> > @@ -567,7 +568,7 @@ static struct trace_printk_buffer >> *kprof_get_printk_buffer(void) >> > static struct trace_printk_buffer boot_tpb; >> > static struct trace_printk_buffer *cpu_tpbs; >> > >> > - if (unlikely(booting)) >> > + if (unlikely(!num_cores)) >> > return &boot_tpb; >> >> That seems a little odd. I'd think if we're still booting, we'd use the >> boot_tpb. Was there some corner case that triggered this? I understand >> this >> one: >> >> > - if (likely(!booting)) >> > + if (likely(system_timing.tsc_freq)) >> > tsc2timespec(read_tsc(), &ts_now); >> > > The booting flag is cleared really late, and I would not want stuff > starting spinning on other cores, colliding on boot tbp. > > > > > -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
