Eli Cohen wrote:
Where is the other place that you acquire conn->ibc_sched->ibs_lock?
Is it in the per CPU thread? Maybe you should try to decrease the time
when the lock is acquired at the thread. Can you send all references
to the code aquiring the lock?
Eli,

Yes, it's a per-CPU lock (for CPU affinity thread), so completion callback can only race with one thread a time, and I'm very sure the thread just do very light operations with the lock. I actually already know the reason: Most time, interrupts always perfer to happen on the same cpu, so there is no chance to schedule CPU-watchdog thread if there is quite a lot interrupts on the cpu, i.e: 100K/Sec, although we have irqbalance, but it's wakeup per 10 seconds(it's a pity that we can't change to interval, it's hard-code constant), which is enough to trigger soft lockup warning. So I add a static counter for completion handler, and when we found there are too many interrupts for several seconds, we just call touch_softlockup_watchdog() to tell watchdog the CPU is OK, not soft lockup..... Also, we reserve the first core on each CPU socket (no affinity thread bound on it), to make sure there are enough cores to handle interrupts. I know it's urgly, but it works and it's the only way that I can find to resolve my problem, if there is no mutiple completion vectors.

Anyway, I really think multiple completion vectors will be an important feature in the recent future, because our hardwares are more and more faster, and machines have more and more CPU-cores.

Thanks
Liang


I've tried to turn off irqbalancer and set /proc/irq/.../smp_affinity
for more cores, but changed nothing and still soft lockup.

After I installed ofed1.4.1 and create CQ with
ib_create_cq(....comp_vector), the problem is gone and get really good
performance. The problem now is, seems ofed1.4.1:mlx4 is the only driver
can really support multiple completion vectors, but we can't expect all
customers to have the same environment...
Is there only other possible way to resolve this?

Thanks
Liang
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to