On 2015-11-23 at 20:17 "'Davide Libenzi' via Akaros"
<[email protected]> wrote:
> > > +void completion_wait(struct completion *comp)
> > > +{
> > > +     cv_lock(&comp->cv);
> > > +     if (comp->count > 0)
> > > +             cv_wait(&comp->cv);
> >
> > This is fine as is.
> >
> 
> This seems racy, in general. In this case, we never issue to the
> local CPU, otherwise you could do cv_lock(), get an IRQ which does
> cv_lock_irqsave(), and get a deadlock.
> IMO that cv_lock() should be irqsave.

Yeah, cv_lock would need to be irqsave, since it could be used from IRQ
context (as mentioned above).

> Like this?
> 
> void completion_wait(struct completion *comp)
> {
>     int8_t state;
> 
>     cv_lock_irqsave(&comp->cv, &state);
>     if (comp->count > 0)
>         cv_wait_and_unlock(&comp->cv);
>     enable_irqsave(&state);
> }

Roughly.  That also needs to unlock if we don't take the branch.  I
think this would work:

void completion_wait(struct completion *comp)
{
    int8_t state;

    cv_lock_irqsave(&comp->cv, &state);
    if (comp->count > 0) {
        cv_wait_and_unlock(&comp->cv);
        enable_irqsave(&state);
    } else {
        cv_unlock_irqsave(&comp->cv, &state);
    }
}

> > > +static inline void cpu_set_setcpu(struct cpu_set *cset, unsigned
> > > int
> > cpuno)
> > > +{
> > > +     cset->cpus[cpuno / CPUSET_INT_BITS] |= 1 << (cpuno %
> > CPUSET_INT_BITS);
> > > +}
> >
> > This stuff all looks correct.  But we should use bitops for it
> > instead of rolling our own.  We brought in many of Linux's bitops
> > (k/i/bitops.h, k/a/x/bitops.h).  If there's a general-purpose bitop
> > we're missing that you need, then we should add it to bitops.h
> >
> 
> It is missing everything. get/set/clear

I think we were missing get_bit, but we have __set_bit and __clear_bit
(the __ are the non-atomic versions).

> The operations in bitops seem to be more special operations, and
> targeting a single unsigned long.

I think they treat the pointer argument as a base address, and then
they offset to any address.  For instance:

static inline void __set_bit(int nr, volatile unsigned long *addr)
{
    asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
}

where bts:
        If the bit base operand specifies a memory location, the
        operand represents the address of the byte in memory that
        contains the bit base (bit 0 of the specified byte) of the bit
        string. The range of the bit position that can be referenced by
        the offset operand depends on the operand size.

and for 64 bit operands, it can go from -2^63 to 2^63 - 1 (from the SDM
V2a 3-10).


> I added mem_get_bit, mem_set_bit, mem_clear_bit to bitops.h.

So I think we just need get_bit (or test_bit), which can go in the
arch/bitops.h.  x86's can use the 'bt' instruction.

> > So now it looks like there's two parts of this commit that should
> > be in other commits.  That's something that can be fixed with
> > multiple runs of git rebase -i, as described above.
> >
> 
> Will try. /shivering ...

Thanks.  Once you get the hang of it, it's pretty powerful.  I think of
it as assigning hunks (that make up a diff) to various commits.

> > > +     n = pwrite(fd, &tmp, sizeof(uint64_t), 0x309);
> > > +     UT_ASSERT_M("Failed to write MSR values to 0x309",
> > > +                             n == sizeof(uint64_t));
> >
> > Will these tests always be passing?  What are those MSRs, and why
> > do we allow
> > the user to read / write them?
> >
> 
> The 0x309 is a counter value register. Well, if we need to test
> write(), we need to write somewhere.

Sorry if I was unclear, I was mostly looking for a #define naming the
register.

> > +/* White list entries needs to be ordered by start address, and
> > never overlap.
> > > + */
> > > +static const struct address_range msr_rd_wlist[] = {
> > > +     ADDRESS_RANGE(0x00000000, 0xffffffff),
> > > +};
> >
> > I like the use of a whitelist, but we need to actually pick what we
> > want userspace to be able to read.  They could give us a garbage
> > MSR and probably
> > trigger a GPF or something nasty.
> >
> 
> Basically, MSR varies from CPU family to CPU family.

Do the values vary for a particular MSR, or just that the availability
of MSRs vary based on the family?  I was under the impression that it's
the latter (availability varies, not the value).  In that case, we can
be explicit about what MSRs we're giving access to.

> In Linux, if you have permissions, you can read/write whatever you
> want.

I guess this means if you're root?  I'd like for us to limit the
whitelist to things we need, if we're going to open this up for any
use.  (But it sounds like you don't need it for perf, so maybe we can
put this in something for "admin only" in the future).

> Linux read msr code uses exception tables to catch errors.
> Not sure what to do here.

Ah nice.  I guess we'd need something similar.  Even if we whitelist
MSR_FOOBAR in the code, knowing that it is okay for the user to
read/write it, but then it turns out we run on a machine that doesn't
have that MSR, we'd still GPF.  So we'd want something like the
exception fixups.

> > > +static const struct address_range msr_wr_wlist[] = {
> > > +     ADDRESS_RANGE(0x000000c1, 0x000000c8),
> > > +     ADDRESS_RANGE(0x00000186, 0x00000189),
> > > +     ADDRESS_RANGE(0x00000199, 0x00000199),
> > > +     ADDRESS_RANGE(0x00000309, 0x0000030b),
> > > +     ADDRESS_RANGE(0x0000038d, 0x0000038d),
> > > +     ADDRESS_RANGE(0x0000038f, 0x00000391),
> The ranges also, depends on the number of cores.

Why is that?

> We could have the ranges empty, and populate them with boot cmdline
> options. Keep in mind that I am planning not to use the MSR device
> for userspace perf anymore.

Ah, okay.

> Rebased branch according to comments so far. Missing some pending ...

Thanks!

Barret


-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to