Hi, John Youn <[email protected]> writes: >>> Yes it will re-assert. The interrupt line will remain asserted as long >>> events remain in the buffer and it is not masked. So when we >>> eventually unmask, the interrupt will be reasserted again immediately. >>> >>>> BTW, other option here could be using (like Baolin propose some time >>>> ago) dwc->lock in top half while this is held for BH. >>>> Question how long spin_lock() will be held in top half... >>>> I am not sure what option is the best. >>> >>> That won't help in this case since you can still have two calls top >>> top-half before a call to bottom. The lock only prevents the two from >>> stepping on each other, but that should already be the case without >>> needing the lock. >>> >> Really can we have two TH calls before BH? >> Interesting case :) > > That's what we seem to be seeing. I'm not sure if it is normal or not.
no, that's not normal. If this is happening, there's either a HW bug
somewhere, or we're not clearing events properly.
Can you provide tracepoints of the failure? Perhaps add a trace_printk()
call on both BH and TH just so we see when they're both called.
>> You suggest we have something like this:
>> dwc3 IRQ
>> kernel irq_mask
>> dwc3 TH
>> mask dwc3 interrupts
>> get evt->count, evt->cache
>> write to hw (count)
>> return WAKE_THREAD
>> kernel irq_unmask
>> a) Before BH start we get another dwc3 IRQ?
>> b) CPU0 starts with BH while we get dwc3 IRQ on CPU1?
>>
>> *) in normal case BH should start and in the end unmask
>> dwc3_interrupts and after that we could get new irq
>>
>> If this could happen then for sure you need dwc->lock protection in TH
>> while base on your description CPU0 can execute BH and CPU1 can handle
>> TH - so you have to protect dwc3_event_buffer.
>>
>> Could you describe this more? How to reproduce?
>
> We can reproduce by running a file transfer continuously. It fails
> fairly reliably, but it can take a while to hit the condition. We are
> using our PCIe based HAPS FPGA on x86_64 using mainline kernel.
Okay, please provide tracepoints of the problem in question.
> Thinh can provide which IP versions we tried with.
Thank you
>> Do you think we introduce this when adding evt->cache in TH?
>> Even without evt->cache we still could overwrite evt->count - so, was
>> that seen before evt->cache?
>
> The multiple TH calls could have happened even before we introduced
> evt->cache, but only with the cache would it have caused a failure due
> to overwriting the cached events on multiple calls.
Right, multiple TH calls should NEVER happen, because our IRQ line is
masked. Unless, and this is a long shot, IRQ line is shared and ANOTHER
device is causing IRQ to be raised. Can you show the output of:
# grep dwc3 /proc/interrupts
If another device raises the interrupt, then we will get into our TH,
however we should just return IRQ_HANDLED in that case because we
shouldn't be generating events.
Hmm, can you apply this patch and provide output when the failure
happens? I suggest setting up a 100MiB trace buffer. You don't need to
enable any tracers nor any event. trace_printk() is always enabled.
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 6f6f0b3be3ad..3b8185d81f9f 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3005,6 +3005,7 @@ static irqreturn_t dwc3_thread_interrupt(int irq, void
*_evt)
unsigned long flags;
irqreturn_t ret = IRQ_NONE;
+ trace_printk("BH\n");
spin_lock_irqsave(&dwc->lock, flags);
ret = dwc3_process_event_buf(evt);
spin_unlock_irqrestore(&dwc->lock, flags);
@@ -3019,6 +3020,8 @@ static irqreturn_t dwc3_check_event_buf(struct
dwc3_event_buffer *evt)
u32 count;
u32 reg;
+ trace_printk("evt->flags %08x\n", evt->flags);
+
if (pm_runtime_suspended(dwc->dev)) {
pm_runtime_get(dwc->dev);
disable_irq_nosync(dwc->irq_gadget);
@@ -3027,7 +3030,9 @@ static irqreturn_t dwc3_check_event_buf(struct
dwc3_event_buffer *evt)
}
count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
+ trace_printk("GEVNTCOUNT %08x\n", count);
count &= DWC3_GEVNTCOUNT_MASK;
+ trace_printk("%u events pending\n", count);
if (!count)
return IRQ_NONE;
@@ -3036,10 +3041,12 @@ static irqreturn_t dwc3_check_event_buf(struct
dwc3_event_buffer *evt)
/* Mask interrupt */
reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
+ trace_printk("GEVNTSIZ %08x\n", reg);
reg |= DWC3_GEVNTSIZ_INTMASK;
dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
amount = min(count, evt->length - evt->lpos);
+ trace_printk("copying events to cache\n");
memcpy(evt->cache + evt->lpos, evt->buf + evt->lpos, amount);
if (amount < count)
@@ -3053,7 +3060,7 @@ static irqreturn_t dwc3_check_event_buf(struct
dwc3_event_buffer *evt)
static irqreturn_t dwc3_interrupt(int irq, void *_evt)
{
struct dwc3_event_buffer *evt = _evt;
-
+ trace_printk("TH\n");
return dwc3_check_event_buf(evt);
}
--
balbi
signature.asc
Description: PGP signature
