Re: [akaros] Re: Code Review - Use core_id_early() in kprof trace buffer print

'Davide Libenzi' via Akaros Tue, 08 Dec 2015 08:44:26 -0800

Code in the same branch.
This also should fix a potential race in case kzmalloc() ended up
printk()-ing (another thing we tried with Gan yesterday - but wasn't it).



On Tue, Dec 8, 2015 at 8:29 AM, Davide Libenzi <[email protected]> wrote:

> Yeah, during the tests we did with Gan, I had a CAS on the in_use. Let me
> put that in.
> I would not want the tracing code disabled in the early stages, if it is
> fixable.
> We would lose a bunch of info spitted out before.
> Setting up GS base as first thing on remote core boots, will help.
>
>
> On Tue, Dec 8, 2015 at 8:22 AM, Barret Rhoden <[email protected]>
> wrote:
>
>> On 2015-12-08 at 06:27 "'Davide Libenzi' via Akaros"
>> <[email protected]> wrote:
>> > After looking a bit more about this bug, this was, /methink, more
>> > triggered by memory structures being just at different positions due
>> > to new Gan code, more than fiddling with GS_BASE MSR.
>> > There is basically a window of time, where num_cores >0 and GS/percpu
>> > is not setup correctly, but still pointing somewhere we do not GPF.
>>
>> I guess this is after topology_init() and before smp_boot()
>> (arch_init() -> smp_boot() -> smp_final_core_init()).
>>
>> This is somewhat true.  Core 0's GS should be set up in assembly before
>> we even get into kernel_init().  k/a/x/entry64.S.  So the above window
>> only should exist for cores other than core 0.
>>
>> Any idea which core was flipping out?  (some non-zero core could be
>> printing during smp_boot during an APIC access, which is my guess).
>>
>> Also, I'm not sure this fix is enough:
>>
>> static struct trace_printk_buffer *kprof_get_printk_buffer(void)
>> {
>>     static struct trace_printk_buffer boot_tpb;
>>     static struct trace_printk_buffer *cpu_tpbs;
>>
>>     if (unlikely(!num_cores))
>>         return &boot_tpb;
>>     if (unlikely(!cpu_tpbs)) {
>>         /* Poor man per-CPU data structure. I really do no like
>>          * littering global data structures with module specific data.
>>          */
>>         spin_lock_irqsave(&ktrace_lock);
>>         if (!cpu_tpbs)
>>             cpu_tpbs = kzmalloc(num_cores * sizeof(struct
>>                                 trace_printk_buffer), 0);
>>         spin_unlock_irqsave(&ktrace_lock);
>>     }
>>
>>     return cpu_tpbs + core_id_early();
>> }
>>
>> (this is with your change).
>>
>> If we get a printk from a core != 0, but before smp_final_core_init()
>> finishes (specifically, when we set core_id_ready = TRUE;), then
>> core_id_early will return 0, and some core other than core 0 will get
>> the trace buffer for core 0.
>>
>> Odds are, there's a race associated with that.  For instance:
>>
>>     char *usrbuf = tpb->buffer, *kpbuf = tpb->buffer + usr_bufsz;
>>     const char *utop, *uptr;
>>     char hdr[64];
>>
>>     if (tpb->in_use)
>>         return;
>>     tpb->in_use++;
>>
>> That is racy if there are concurrent users of tpb.
>>
>> One solution is to not use trace_printk for all printks before
>> smp_boot.  That's lousy since we'd like to have all of that saved.
>>
>> Another would be to not use per-core buffers for trace_printk.  It's not
>> clear that we need that level of optimization for regular printk (which
>> is already serialized, btw), but maybe we would like it for the
>> tracer.  I'd also like to keep the "cpuX:" formatting.  So that stinks
>> too.
>>
>> Another option would be to have a call to disable trace_vprintk
>> temporarily, and smp_boot can turn it off and on.
>>
>> In general, it's hard to do per-cpu things when we're in the window of
>> having multiple cores but not having completed smp initialization.
>>
>> FWIW, we'll probably run into this problem in the future with other
>> things.
>>
>> Barret
>>
>> p.s. https://www.youtube.com/watch?v=6bOy3RNyWME
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Akaros" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [akaros] Re: Code Review - Use core_id_early() in kprof trace buffer print

Reply via email to