On 2013年05月19日 08:00, Or Gerlitz wrote:
> On 19/05/2013 00:36, Jack Wang wrote:
>> I tried 3.4.23, and mainline kernel from Roland's rdma-for-linus, we
>> added bug injection interface, run multithread iperf, and switched ib
>> mode between connected and datagram in sync on each side as Shlomo
>> suggested.
>
> Can you be more specific re the bug injection interface, is that
> existing kernel mechanism or something you added? so the bug triggers
> when you run iperf in multi-threaded mode AND in parallel inject errors
> AND in parallel switch between datagram and connected mode? bee --- I
> assume this isn't something you do just for the fun of it... so some
> problem X hits you in production and this problem Y you get with the
> above juggling, any known or empiric relation between the two?
>
> Or.
we added inject_bug sysfs node to make function run into error case,
like something below.
Yes, you are right, we want to speedup the bug reproduce process,
and we saw the warning and come to conclusion the neigh->list corrupted
some where.
What's your opinion?
Regards,
Jack
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -797,10 +797,12 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev,
struct ib_wc *wc)
test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
netif_wake_queue(dev);
- if (wc->status != IB_WC_SUCCESS &&
- wc->status != IB_WC_WR_FLUSH_ERR) {
+ if (priv->inject_bug ||
+ (wc->status != IB_WC_SUCCESS &&
+ wc->status != IB_WC_WR_FLUSH_ERR)) {
struct ipoib_neigh *neigh;
+ priv->inject_bug = 0;
ipoib_dbg(priv, "failed cm send event "
"(status=%d, wrid=%d vend_err %x)\n",
wc->status, wr_id, wc->vendor_err);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html