Re: [akaros] Code Review - Add MSR device to devarch

Barret Rhoden Mon, 23 Nov 2015 12:43:16 -0800

Hi -

On 2015-11-09 at 07:01 "'Davide Libenzi' via Akaros"
<[email protected]> wrote:
> This adds the "msr" device to devarch, allowing userspace software
> (ie, CPU counters programming machinery) to interact with MSRs.
> I also cleaned up that file a bit, dropping a few things that were
> unused anywhere (there are some more in there).
> 
> 
> https://github.com/brho/akaros/compare/master...dlibenzi:devarch_msr
> 
> 
> The following changes since commit
> 1165c2bda44b7f1fb3b776c0dc5b0fb4dd499961:
> 
>   Add networking unit tests (2015-11-03 12:00:38 -0500)
> 
> are available in the git repository at:
> 
>   [email protected]:dlibenzi/akaros devarch_msr
> 
> for you to fetch changes up to
> bd69792de3d53f08b7b6e15de684b22744c52f62:
> 
>   Added test for devarch MSR file (2015-11-09 06:49:04 -0800)


Comments below:

> From b8425da6e7813b9f0c6cb2e1ebe0124236e9eeba Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Sun, 8 Nov 2015 11:25:39 -0800
> Subject: Added completion data structure
> 
> diff --git a/kern/src/completion.c b/kern/src/completion.c
> new file mode 100644
> index 000000000000..40e922ab8120
> --- /dev/null
> +++ b/kern/src/completion.c
> @@ -0,0 +1,30 @@
> +/* Copyright (c) 2015 Google Inc
> + * Davide Libenzi <[email protected]>
> + * See LICENSE for details.
> + */
> +
> +#include <kthread.h>
> +#include <completion.h>
> +
> +void completion_init(struct completion *comp, int count)
> +{
> +     cv_init(&comp->cv);

If you want to signal completion from IRQ context, then we'll need these CVs to
be irqsave.  In that case, you'll need to use the irqsave variants for most of
the CV operations (e.g. cv_lock_irqsave).  If you don't, you'll run the risk of
deadlocking, and CONFIG_SPINLOCK_DEBUG will complain.

> +     comp->count = count;
> +}
> +
> +void completion_complete(struct completion *comp, int how_much)
> +{
> +     cv_lock(&comp->cv);
> +     comp->count -= how_much;
> +     if (comp->count <= 0)
> +             __cv_broadcast(&comp->cv);

Would it be a bug for comp->count to ever be less than 0?  (Depends on how this
is used).

> +     cv_unlock(&comp->cv);
> +}
> +
> +void completion_wait(struct completion *comp)
> +{
> +     cv_lock(&comp->cv);
> +     if (comp->count > 0)
> +             cv_wait(&comp->cv);

This is fine as is.  

As an FYI, if you want to be a little faster, you can use cv_wait_and_unlock,
then immediately return (careful of IRQs).  And/or you could do a racy read of
comp->count to see if it's already 0 without grabbing the lock.

> +     cv_unlock(&comp->cv);
> +}
> -- 
> 2.6.0.rc2.230.g3dd15c0
> 
> From 80128f764bdc236540355df797bf3fc550b93fb7 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Sun, 8 Nov 2015 11:27:01 -0800
> Subject: Added CPU set data structure

Do we want to call these cpu_sets or core_sets?  In general, we've referred to
logical processors as 'cores' or 'pcores' throughout the codebase.  Though we
also have been a little inconsistent with it.

> diff --git a/kern/include/cpu_set.h b/kern/include/cpu_set.h

> +#define CPUSET_INT_BITS (sizeof(cpu_set_int_t) * 8)
> +
> +typedef uint64_t cpu_set_int_t;
> +
> +struct cpu_set {
> +     cpu_set_int_t cpus[(MAX_NUM_CORES + CPUSET_INT_BITS - 1) / 
> CPUSET_INT_BITS];

We have a helper macro for the array size calculation:
        cpus[DIV_ROUND_UP(MAX_NUM_CORES, CPUSET_INT_BITS)];

> +static inline void cpu_set_setcpu(struct cpu_set *cset, unsigned int cpuno)
> +{
> +     cset->cpus[cpuno / CPUSET_INT_BITS] |= 1 << (cpuno % CPUSET_INT_BITS);
> +}

This stuff all looks correct.  But we should use bitops for it instead of
rolling our own.  We brought in many of Linux's bitops (k/i/bitops.h,
k/a/x/bitops.h).  If there's a general-purpose bitop we're missing that you
need, then we should add it to bitops.h


> From 6781115d82ce87a64894ec773b0d69671ebb84ed Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Sun, 8 Nov 2015 11:33:56 -0800
> Subject: Added APIs to read and write MSR values on multiple CPUs
> 

> --- a/kern/arch/x86/devarch.c
> +++ b/kern/arch/x86/devarch.c

> +uint64_t *cpuset_read_msr(const struct cpu_set *cset, uint32_t addr,
> +                                               size_t *nvalues)
> +{
> +     int cpu = core_id();
> +     struct smp_read_values srv;
> +
> +     srv.values = kzmalloc(num_cores * sizeof(*srv.values), KMALLOC_WAIT);
> +     if (srv.values) {

With KMALLOC_WAIT, kmalloc always succeeds.

> +             completion_init(&srv.comp, cpu_set_remote_count(cset));
> +             for (int i = 0; i < num_cores; i++) {
> +                     if (cpu_set_getcpu(cset, i)) {
> +                             if (i == cpu)
> +                                     srv.values[i] = read_msr(addr);
> +                             else
> +                                     send_kernel_message(i, cpu_read_msr, 
> (long) &srv, addr, 0,
> +                                                                             
> KMSG_ROUTINE);
> +                     }
> +             }
> +             completion_wait(&srv.comp);

This whole block should be done with smp_do_in_cpus.

> +             *nvalues = num_cores;
> +     }
> +
> +     return srv.values;
> +}
> +
> +static void cpu_write_msr(uint32_t srcid, long a0, long a1, long a2)
> +{
> +     struct completion *comp = (struct completion *) a0;
> +
> +     write_msr((uint32_t) a1, (uint64_t) a2);
> +     completion_complete(comp, 1);
> +}
> +
> +void cpuset_write_msr(const struct cpu_set *cset, uint32_t addr, uint64_t 
> value)
> +{
> +     int cpu = core_id();
> +     struct completion comp;
> +
> +     completion_init(&comp, cpu_set_remote_count(cset));
> +     for (int i = 0; i < num_cores; i++) {
> +             if (cpu_set_getcpu(cset, i)) {
> +                     if (i == cpu)
> +                             write_msr(addr, value);
> +                     else
> +                             send_kernel_message(i, cpu_write_msr, (long) 
> &comp, addr, value,
> +                                                                     
> KMSG_ROUTINE);
> +             }
> +     }
> +     completion_wait(&comp);

This whole block should be done with smp_do_in_cpus.

> From 008c4762727a39ad8e5c5dd2265317445ddb86bd Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Sun, 8 Nov 2015 16:44:41 -0800
> Subject: Cleaned up devarch.c code to remove unused code

> -static char *devname(void)
> -{
> -     return archdevtab.name;
> -}

Not a big deal, but the reason I had this devname() helper is that many devices
borrow code from each other.  devarch only used it once, but others use devname
a bunch of times.  When that code gets reused for a new device, we can leave
devname as is.  e.g. copying error messages around like:
        set_error(ENOSYS, "Can't tap #%s file type %d", devname(), type);


> From ca1462749dbdd1c4d8fb34973d42da5a3972aebe Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Sun, 8 Nov 2015 17:49:16 -0800
> Subject: Plugged MSR read and write APIs into devarch MSR file

> --- a/kern/arch/x86/devarch.c
> +++ b/kern/arch/x86/devarch.c
> @@ -19,12 +19,15 @@
>  #include <error.h>
>  #include <cpio.h>
>  #include <pmap.h>
> +#include <umem.h>
>  #include <smp.h>
>  #include <ip.h>
>  #include <time.h>
>  #include <cpu_set.h>
>  #include <completion.h>
>  
> +#define REAL_MEM_SIZE (1024 * 1024)
> +
>  struct io_map {
>       struct io_map *next;
>       int reserved;
> @@ -48,7 +51,7 @@ enum {
>       Qiow,
>       Qiol,
>       Qgdb,
> -     Qmem,
> +     Qrealmem,

So in the last cleanup patch you removed realmem and make it read up to
max_paddr bytes.  In this patch, you undid that.  Ideally, the parts of
this patch that made that change would be squashed into the previous
patch.

You can do this with git rebase -i.  For this commit, choose "e" (edit),
then:
$ git reset HEAD^ # unstage this entire commit's changes
$ git add -p # selectively pick what should be squashed
$ git commit --amend # push those changes into the previous commit
        or
        git commit -m WIP-squash # make a separate commit, fix it up later
$ git add -p # add everything for *this* commit
$ git commit -c ca14627 # commit with the original message 
$ git rebase --continue

>  uint64_t *cpuset_read_msr(const struct cpu_set *cset, uint32_t addr,
>                                                 size_t *nvalues)
>  {
> -     int cpu = core_id();
>       struct smp_read_values srv;
>  
> -     srv.values = kzmalloc(num_cores * sizeof(*srv.values), KMALLOC_WAIT);
> +     srv.addr = addr;
> +     srv.values = kzmalloc(num_cores * sizeof(*srv.values), 0);
>       if (srv.values) {
> -             completion_init(&srv.comp, cpu_set_remote_count(cset));
> -             for (int i = 0; i < num_cores; i++) {
> -                     if (cpu_set_getcpu(cset, i)) {
> -                             if (i == cpu)
> -                                     srv.values[i] = read_msr(addr);
> -                             else
> -                                     send_kernel_message(i, cpu_read_msr, 
> (long) &srv, addr, 0,
> -                                                                             
> KMSG_ROUTINE);
> -                     }
> -             }
> -             completion_wait(&srv.comp);
> +             smp_do_in_cpus(cset, cpu_read_msr, &srv);

This is exactly what I said to do for the earlier patch.  I'm glad you
did it, but I would be even happier if you fixed up the git history so
that it was like that originally.

It may seem trivial, but not only does it keep the git history clean (so
that we can easily see how the code evolved), but it saves me time.  I
was making comments on code that had already been fixed.

So now it looks like there's two parts of this commit that should be in
other commits.  That's something that can be fixed with multiple runs of
git rebase -i, as described above.

> @@ -397,10 +386,27 @@ static long archread(struct chan *c, void *a, long n, 
> int64_t offset)
>                       return n;
>               case Qioalloc:
>                       break;
> -             case Qmem:
> -                     printk("readmem %p %p %p %p %p\n", offset, a, n, 
> KADDR(0),
> -                                max_paddr);
> -                     return readmem(offset, a, n, KADDR(0), max_paddr);
> +             case Qrealmem:
> +                     return readmem(offset, a, n, KADDR(0), REAL_MEM_SIZE);

This change (realmen) is right next to Qmsr.  When you try to fix this
with a rebase and do a git add -p, git won't be able to figure out that
these are separate.  You'll have to specify 'e' and manually edit the
hunk when you git add.  It should be pretty straightforward.

> +             case Qmsr:
> +                     cpu_set_init(&cset);
> +                     cpu_set_fill_available(&cset);
> +                     values = cpuset_read_msr(&cset, (uint32_t) offset, 
> &nvalues);
> +                     if (values) {
> +                             if (n >= nvalues * sizeof(uint64_t)) {
> +                                     if (memcpy_to_user_errno(current, a, 
> values,
> +                                                                             
>          nvalues * sizeof(uint64_t)))
> +                                             n = -1;
> +                                     else
> +                                             n = nvalues * sizeof(uint64_t);
> +                             } else {
> +                                     n = 0;

So if the user doesn't ask for a large enough n, we tell them EOF?  Is
this a bug/error on the user's part?

> From 0407977d44cc3cbcc8c24de1dca67676b6b6189f Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Sun, 8 Nov 2015 20:11:28 -0800
> Subject: Added test for devarch MSR file

> --- /dev/null
> +++ b/user/utest/devarch_file_test.c
> @@ -0,0 +1,69 @@
> +
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <fcntl.h>
> +#include <utest/utest.h>
> +
> +TEST_SUITE("DEVARCH");
> +
> +/* <--- Begin definition of test cases ---> */
> +
> +static bool test_msr(void)
> +{
> +     static const int max_cpus = 256;

256 should be MAX_NUM_CORES, or MAX_NUM_CORES + 1 if you want a buffer.
To get the symbol, #include <ros/arch/arch.h>.

> +     int fd = open("#arch/msr", O_RDWR);
> +     uint64_t *values = malloc(max_cpus * sizeof(uint64_t));
> +     uint64_t tmp = 0;
> +     ssize_t n;
> +
> +     UT_ASSERT_M("Failed to open MSR device file (#arch/msr)", fd >= 0);
> +     UT_ASSERT_M("Failed to allocated MSR values memory", values);
> +
> +     n = pread(fd, values, max_cpus * sizeof(uint64_t), 0x10);
> +     UT_ASSERT_M("Failed to read MSR values from 0x10", n > 0);
> +
> +     n = pread(fd, values, max_cpus * sizeof(uint64_t), 0x309);
> +     UT_ASSERT_M("Failed to read MSR values from 0x309", n > 0);
> +
> +     n = pwrite(fd, &tmp, sizeof(uint64_t), 0x309);
> +     UT_ASSERT_M("Failed to write MSR values to 0x309",
> +                             n == sizeof(uint64_t));

Will these tests always be passing?  What are those MSRs, and why do we allow
the user to read / write them?


> From e8cb42302b1ab90d6457356e3cc1063a499d3665 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <[email protected]>
> Date: Mon, 9 Nov 2015 16:33:48 -0800
> Subject: Added whitelisting to MSR read/write code

> --- a/kern/arch/x86/devarch.c
> +++ b/kern/arch/x86/devarch.c
> @@ -82,7 +80,19 @@ static struct dirtab archdir[Qmax] = {
>       {"realmem", {Qrealmem, 0}, 0, 0444},
>       {"msr", {Qmsr, 0}, 0, 0666},
>  };
> -
> +/* White list entries needs to be ordered by start address, and never 
> overlap.
> + */
> +static const struct address_range msr_rd_wlist[] = {
> +     ADDRESS_RANGE(0x00000000, 0xffffffff),
> +};

I like the use of a whitelist, but we need to actually pick what we want
userspace to be able to read.  They could give us a garbage MSR and probably
trigger a GPF or something nasty.

> +static const struct address_range msr_wr_wlist[] = {
> +     ADDRESS_RANGE(0x000000c1, 0x000000c8),
> +     ADDRESS_RANGE(0x00000186, 0x00000189),
> +     ADDRESS_RANGE(0x00000199, 0x00000199),
> +     ADDRESS_RANGE(0x00000309, 0x0000030b),
> +     ADDRESS_RANGE(0x0000038d, 0x0000038d),
> +     ADDRESS_RANGE(0x0000038f, 0x00000391),
> +};

What MSRs are these?  Given the sensitivity of this stuff, it'd be nice to use
some sort of #define / name for MSRs instead of magic numbers.  I like the
address range, but we lose out on explicitly marking what MSRs we are allowing.
I think using real names (e.g. MSR_FOOBAR) and also using the range should be
okay, so long as we only make ranges of similar, contiguous MSRs.


-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [akaros] Code Review - Add MSR device to devarch

Reply via email to