Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.

2007-02-27 Thread Michael S. Tsirkin
 Quoting Moni Levy [EMAIL PROTECTED]:
 Subject: [RFC] IB/ipoib: Asynchronous events delivered without port parameter.
 
 Hello,
 I did a short code review of the ipoib code concentrating on
 partitioning support and I mentioned that the asynchronous events
 handler in the ipoib code does not take the port number reported in
 the event record into consideration. The effect of that is that all of
 the ib# devices related to that specific HCA are flushed when it seems
 to me that only the relevant port one should be. Is that done on
 purpose, or am I missing something ?
 
 Thanks,
 Moni
 
 p.s. I'm working on a patch that should solve another issue caused by
 PKEY reordering  ipoib behavior and the above issue further
 complicates things for me.

If true, why is this a problem?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB

2007-02-27 Thread Michael S. Tsirkin
 Quoting Moni Shoua [EMAIL PROTECTED]:
 Subject: Re: [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB
 
 
 Thanks for the comments 
 
  To fix it, this patch adds a dev field to struct ipoib_neigh which is used
  instead of the struct neighbour dev one.
  
  It seems that in this design, if multiple ipoib interfaces are present, we 
  might
  get an skb such that skb-dev will be different from the new dev field in 
  struct
  ipoib_neigh.
  
  It seems that the result will be that the packet will be sent on a wrong 
  interface.
  Right?
  
 I don't see how. The field dev in ipoib_neigh doesn't take part in interface 
 selection.
 As I see it, skb travels this path:
 1. Passed to bond_dev-hard_start_xmit
 2. bond_dev-hard_start_xmit chooses the current active interface, changes 
 skb-dev and enqueues it back for xmittig.

ipoib_neigh ah field includes struct ib_ah *.
This selects important parameters which depend on both packet source and
destination interfaces.

I think the right thing might be to compare ipoib_neigh dev and skb-dev,
and destroy ipoib_neigh if these do not match.

  In addition, if an IPoIB device is removed before bonding is unloaded it 
  may 
  cause bond0 neighbours (neighbours that point to bond0) to exist after the 
  IPoIB
  device no longer exist. This is why a neighbour cleanup is required during 
  device 
  cleanup. This cleanup scans the arp cache and the ndisc cache to find 
  there 
  neighbours of bond0 which refer also to the relevant ibX. Also, when 
  ib_ipoib module is
  unloaded, the neighbour destructor must be set to NULL because the 
  neighbour function is in
  ib_ipoib.
  For this neigh table cleanup, it is required to export the symbol nd_tbl 
  just like the symbol arp_tbl is.
  
  I wonder about this: is it really true that any allocated neighbour is 
  always in
  either arp_tbl or nd_tbl? For example, could some code have called 
  neigh_hold
  and retained a neighbour that is not in either one of these tables?
  
 I got the assumption about neighbours living in one of these 2 tables from
 observation and code reading.  I preferred that that on keeping track of all
 ipoib_neighs and putting them in a list. However, I could do that instead of
 neigh_table scanning. Do you think it's better?

If some neighbours are not on any tables, it seems using our own lists
(e.g. lists we have in ipoib_path) is the only option, no?

 For the example... I didn't
 understand it. Could you please explain?

grep for neigh_hold. neighbour is only destroyed when ref count goes to 0.
If some code does neigh_hold, it seems neighbour could be removed from table
but destructor not yet called.

  During my tests I found that when running 
 
 1. modprobe -r ib_mthca (to delete IPoIB interfaces)
 2. ping somewhere on the subnet of bond0
 
  I get this stack dump (which ends with kernel death)
  [8037ff32] skb_under_panic+0x5c/0x60
  [882e00c2] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
  [803c3c98] arp_create+0x120/0x226
  [803c3dc3] arp_send+0x25/0x3b
  [803c466a] arp_solicit+0x186/0x195
  [8038c0ac] neigh_timer_handler+0x2b5/0x309
  [8038bdf7] neigh_timer_handler+0x0/0x309
  [80239599] run_timer_softirq+0x130/0x19e
  [80235fcc] __do_softirq+0x55/0xc3
  [8020acac] call_softirq+0x1c/0x28
  [8020c02b] do_softirq+0x2c/0x7d
  [8021864a] smp_apic_timer_interrupt+0x57/0x6a
  [80208e19] mwait_idle+0x0/0x45
  [8020a756] apic_timer_interrupt+0x66/0x70
  EOI  [80208e5b] mwait_idle+0x42/0x45
  [80208db1] cpu_idle+0x8b/0xae
  [80217d60] start_secondary+0x47f/0x48f
 
  The only way I found to avoid this (for now) is to check skb headroom in
  ipoib_hard_header. I guess that this safety check doesn't harm regular 
  IPoIB 
  operation and it seems to solve my problem. However, I would be happy to 
  hear what
  others think of this last issue.
  
  As I said, this seems to indicate a problem in the bonding code.
  But what will happen after you error out in ipoib_hard_header?
  Is the packet dropped? What might break as a result?
  
 I will check the hard_header_len issue in the bonding code more carefully.
 From first look it seems that bonding does borrow the hard_header_len.

So where does a shorter message come from?

 Also,
 my checks show that it is safe to return with error from
 hard_header().  For example,  in neigh_connected_output:
 
 err = dev-hard_header(skb, dev, ntohs(skb-protocol),
neigh-ha, NULL, skb-len);
 read_unlock_bh(neigh-lock);
 if (err = 0)
 err = neigh-ops-queue_xmit(skb);
 else {
 err = -EINVAL;
 kfree_skb(skb);
  
  I would really appreciate comments.
 
  thanks
 
   -MoniS
  

-- 
MST

___
openib-general 

Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering

2007-02-27 Thread Michael S. Tsirkin
I just gave this a cursory glance.
A suggestion: would it not be much simpler to modify the QP from RTS to RTS on 
pkey
change?

 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 index f2aa923..b0287c1 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device 
  
   ret = ipoib_init_qp(dev);
   if (ret) {
 - ipoib_warn(priv, ipoib_init_qp returned %d\n, ret);
 + if (ret != -ENOENT)
 + ipoib_warn(priv, ipoib_init_qp returned %d\n, ret);
   return -1;
   }
 
What's the reason for this?

 @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic
   INIT_DELAYED_WORK(priv-pkey_task,ipoib_pkey_poll);
   INIT_DELAYED_WORK(priv-mcast_task,   ipoib_mcast_join_task);
   INIT_WORK(priv-flush_task,   ipoib_ib_dev_flush);
 + INIT_WORK(priv-flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp);
   INIT_WORK(priv-restart_task, ipoib_mcast_restart_task);
   INIT_DELAYED_WORK(priv-ah_reap_task, ipoib_reap_ah);
  }

Shorter name?

 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
 index b303ce6..27d6fd4 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
 @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc
   ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast-mcmember.mlid),
mcast-mcmember.mgid);
   if (ret  0) {
 - ipoib_warn(priv, couldn't attach QP to multicast group 
 
 -IPOIB_GID_FMT \n,
 -IPOIB_GID_ARG(mcast-mcmember.mgid));
 + if (ret != -ENXIO) /* No PKEY found */
 + ipoib_warn(priv, couldn't attach QP to 
 multicast group 
 +IPOIB_GID_FMT \n,
 +IPOIB_GID_ARG(mcast-mcmember.mgid));
  
   clear_bit(IPOIB_MCAST_FLAG_ATTACHED, mcast-flags);
   return ret;
 @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s
   status = ipoib_mcast_join_finish(mcast, multicast-rec);
  
   if (status) {
 - if (mcast-logcount++  20)
 + if (mcast-logcount++  20  status != -ENXIO)
   ipoib_dbg_mcast(netdev_priv(dev), multicast join 
 failed for 
   IPOIB_GID_FMT , status %d\n,
   IPOIB_GID_ARG(mcast-mcmember.mgid), 
 status);
 @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int
   , status %d\n,
   IPOIB_GID_ARG(mcast-mcmember.mgid),
   status);
 - } else {
 + } else if (status != -ENXIO) {
   ipoib_warn(priv, multicast join failed for 
  IPOIB_GID_FMT , status %d\n,
  IPOIB_GID_ARG(mcast-mcmember.mgid),
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 index 3cb551b..d0384ea 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device
   if (ib_find_cached_pkey(priv-ca, priv-port, priv-pkey, pkey_index)) 
 {
   clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
   ret = -ENXIO;
 + ipoib_dbg(priv, PKEY %X not found\n, priv-pkey);
   goto out;
   }
 + ipoib_dbg(priv, PKEY %X found at index %d\n, priv-pkey, pkey_index);
   set_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
  
   /* set correct QKey for QP */

Make it PKey or pkey: no text in uppercase in log messages please.

 @@ -105,9 +107,11 @@ int ipoib_init_qp(struct net_device *dev
*/
   ret = ib_find_cached_pkey(priv-ca, priv-port, priv-pkey, 
 pkey_index);
   if (ret) {
 + ipoib_dbg(priv, PKEY %X not found.\n, priv-pkey);
   clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
   return ret;
   }
 + ipoib_dbg(priv, PKEY %X found at index %d.\n, priv-pkey, pkey_index);
   set_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
  
   qp_attr.qp_state = IB_QPS_INIT;

going a bit overboard on the number of debug messages here.

 @@ -260,12 +264,14 @@ void ipoib_event(struct ib_event_handler
   container_of(handler, struct ipoib_dev_priv, event_handler);
  
   if (record-event == IB_EVENT_PORT_ERR||
 - record-event == IB_EVENT_PKEY_CHANGE ||
   record-event == IB_EVENT_PORT_ACTIVE ||
   record-event == IB_EVENT_LID_CHANGE  ||
   

Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering

2007-02-27 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering
 
   I just gave this a cursory glance.
 
 I haven't really read it except to think why is this so complicated?
 
   A suggestion: would it not be much simpler to modify the QP from RTS to 
 RTS on pkey
   change?
 
 Changing the P_Key index is not allowed for RTS-RTS.  You would have
 to modify the QP RTS-SQD, wait for the SQ to drain, then modify the
 P_Key index with SQD-SQD, and finally go SQD-RTS.

True, I misread the spec.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.

2007-02-27 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [RFC] IB/ipoib: Asynchronous events delivered without port 
 parameter.
 
  I did a short code review of the ipoib code concentrating on
   partitioning support and I mentioned that the asynchronous events
   handler in the ipoib code does not take the port number reported in
   the event record into consideration. The effect of that is that all of
   the ib# devices related to that specific HCA are flushed when it seems
   to me that only the relevant port one should be. Is that done on
   purpose, or am I missing something ?
 
 I don't think there's any particular reason the code is that way
 except for the oversight never being corrected.  But it looks trivial
 to fix, like the patch below.  Does that look right to you?
 
   p.s. I'm working on a patch that should solve another issue caused by
   PKEY reordering  ipoib behavior and the above issue further
   complicates things for me.
 
 Why not fix the issue first then?
 
 commit a27cbe878203076247c1b5287f5ab59ed143b560
 Author: Roland Dreier [EMAIL PROTECTED]
 Date:   Tue Feb 27 07:37:49 2007 -0800
 
 IPoIB: Only handle async events for one port
 
 An asynchronous event carries the port number that the event occurred
 on, so there's no reason for an IPoIB interface to process an event
 associated with a different local HCA port.
 
 Signed-off-by: Roland Dreier [EMAIL PROTECTED]
 
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 index 3cb551b..7f3ec20 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
 @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler,
   struct ipoib_dev_priv *priv =
   container_of(handler, struct ipoib_dev_priv, event_handler);
  
 - if (record-event == IB_EVENT_PORT_ERR||
 - record-event == IB_EVENT_PKEY_CHANGE ||
 - record-event == IB_EVENT_PORT_ACTIVE ||
 - record-event == IB_EVENT_LID_CHANGE  ||
 - record-event == IB_EVENT_SM_CHANGE   ||
 - record-event == IB_EVENT_CLIENT_REREGISTER) {
 + if ((record-event == IB_EVENT_PORT_ERR||
 +  record-event == IB_EVENT_PKEY_CHANGE ||
 +  record-event == IB_EVENT_PORT_ACTIVE ||
 +  record-event == IB_EVENT_LID_CHANGE  ||
 +  record-event == IB_EVENT_SM_CHANGE   ||
 +  record-event == IB_EVENT_CLIENT_REREGISTER) 
 + record-element.port_num == priv-port) {
   ipoib_dbg(priv, Port state change event\n);
   queue_work(ipoib_workqueue, priv-flush_task);
   }

Looks good.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: Re: [PATCH] for OFED 1.2
 
 Please send patches that will be added to kernel_patches/fixes.
 
 Please update your git tree from
 git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2
 
 You want me to create a patch that adds a file that contains the actual 
 patches?
 
 Why not apply the patches directly?

That's the ofed structure, this was discussed multiple times already.
The point is to keep all changes to upstream components separate,
to make updating to upstream kernel trivial in the future.

Worked quite well for OFED 1.1 - 1.2 transition.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: Re: [PATCH] for OFED 1.2
 
 Yes, actual patches should be created under kernel_patches/fixes.
 
 Please update your git tree because the following patch fails:
 
 Can you explain how the patch fails?  I don't see how putting the patch into a
 file helps.

Try applying it?

  Why not apply the patches directly?
 
 To be consistent with 2.6.20 kernel.
 
 You can check out stock 2.6.20 using a tag.  Why maintain the ofed code in git
 if you don't use it to track patches?

Basically so that conflicts in future merges from upstream are easy to resolve.
If you like, let's reopen this for 1.3. We are after freeze in OFED 1.2.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [openib-general] [PATCH] for OFED 1.2
 
 On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote:
   Quoting Sean Hefty [EMAIL PROTECTED]:
   Subject: Re: [PATCH] for OFED 1.2
   
   Please send patches that will be added to kernel_patches/fixes.
   
   Please update your git tree from
   git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2
   
   You want me to create a patch that adds a file that contains the actual 
   patches?
   
   Why not apply the patches directly?
  
  That's the ofed structure, this was discussed multiple times already.
  The point is to keep all changes to upstream components separate,
  to make updating to upstream kernel trivial in the future.
  
  Worked quite well for OFED 1.1 - 1.2 transition.
  
 
 Having these patches as files is painful for every developer because
 they cannot create a patch against ofed_1_2/drivers/infiniband/* nor the
 kernel.org upstream tree.

Did you try using quilt which makes managing patch stacks quite easy?
If you have quilt installed, OFED scripts actually use it
to apply patches, so things are easy.

 They need to apply all the current patches
 and then create a patch on top of that. Or hope the patch applies
 fuzzily.  

One point I can't stress enough: whatever way you create a patch,
developers are expected to build and test it in OFED environment
before posting.

 I think with stacked git or just git and rebasing at key times, you
 could keep an ofed_1_2 tree that folks can easily apply patches to...
 
 Its too late to change this for 1.2, but you might want to reconsider
 the design for 1.3.

Well, I experimented with git rebase and it is unfortunately still
fragile at this point.

I agree using stacked git might be a good idea, I just did not
have the chance to experiment with it enough. I had an impression
that publishing stg managed branch creates problems for whoever
attempts to track it, but I might be wrong.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: Re: [PATCH] for OFED 1.2
 
 I think with stacked git or just git and rebasing at key times, you
 could keep an ofed_1_2 tree that folks can easily apply patches to...
 
 Its too late to change this for 1.2, but you might want to reconsider
 the design for 1.3.
 
 Can't we just create a new branch (ofed_1_2_patched) with these patches 
 already
 applied and in the correct order?  

Then what we do when we want to update to new upstream? Throw this branch away?
As it is, I just pull then build and remove patches that conflict.

By the way, there are backport patches, etc - it is still incorrect
to say that you would be able to generate a patch out of git
and know it's a good one without test-build.

 Maybe I'm just not understanding the work flow here...

Sean, please install quilt and try using it for working with the system.
Adding new patch is usually done in this way
quilt new patch
quilt add files
edit
quilt refresh

cp patches/patch kernel_patches/fixes/
git add kernel_patches/fixes/patch
git commit kernel_patches/fixes/patch


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
Lot's of stuff *is* in wiki already - did you look at pages Vlad created?
Things can always be improved, you can add stuff too.


Quoting Jeff Squyres [EMAIL PROTECTED]:
Subject: Re: [PATCH] for OFED 1.2

It would be great if all of this knowledge is posted to the wiki to  
avoid repeating this conversation in the future (or one of countless  
variations of this conversation).  For example, I admit to not paying  
close attention to many of the threads on this list, but this was the  
first time I'd head of quilt.

Specifically: if there are tools and methods that are helpful for OFA/ 
OFED development, they should be detailed on the wiki.  The wiki is  
where all permanent knowledge should be posted.

This is just my $0.01...



On Feb 27, 2007, at 12:31 PM, Michael S. Tsirkin wrote:

 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [openib-general] [PATCH] for OFED 1.2

 On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote:
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: Re: [PATCH] for OFED 1.2

 Please send patches that will be added to kernel_patches/fixes.

 Please update your git tree from
 git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2

 You want me to create a patch that adds a file that contains the  
 actual patches?

 Why not apply the patches directly?

 That's the ofed structure, this was discussed multiple times  
 already.
 The point is to keep all changes to upstream components separate,
 to make updating to upstream kernel trivial in the future.

 Worked quite well for OFED 1.1 - 1.2 transition.


 Having these patches as files is painful for every developer because
 they cannot create a patch against ofed_1_2/drivers/infiniband/*  
 nor the
 kernel.org upstream tree.

 Did you try using quilt which makes managing patch stacks quite easy?
 If you have quilt installed, OFED scripts actually use it
 to apply patches, so things are easy.

 They need to apply all the current patches
 and then create a patch on top of that. Or hope the patch applies
 fuzzily.

 One point I can't stress enough: whatever way you create a patch,
 developers are expected to build and test it in OFED environment
 before posting.

 I think with stacked git or just git and rebasing at key times, you
 could keep an ofed_1_2 tree that folks can easily apply patches to...

 Its too late to change this for 1.2, but you might want to reconsider
 the design for 1.3.

 Well, I experimented with git rebase and it is unfortunately still
 fragile at this point.

 I agree using stacked git might be a good idea, I just did not
 have the chance to experiment with it enough. I had an impression
 that publishing stg managed branch creates problems for whoever
 attempts to track it, but I might be wrong.


 -- 
 MST

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
 openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
 This is just my $0.01...

Thanks for the suggestions, but what does $0.01 buy one in US today?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [PATCH] for OFED 1.2
 
 On Tue, 2007-02-27 at 19:44 +0200, Michael S. Tsirkin wrote:
   Quoting Sean Hefty [EMAIL PROTECTED]:
   Subject: Re: [PATCH] for OFED 1.2
   
   I think with stacked git or just git and rebasing at key times, you
   could keep an ofed_1_2 tree that folks can easily apply patches to...
   
   Its too late to change this for 1.2, but you might want to reconsider
   the design for 1.3.
   
   Can't we just create a new branch (ofed_1_2_patched) with these patches 
   already
   applied and in the correct order?  
  
  Then what we do when we want to update to new upstream? Throw this branch 
  away?
  As it is, I just pull then build and remove patches that conflict.
  
  By the way, there are backport patches, etc - it is still incorrect
  to say that you would be able to generate a patch out of git
  and know it's a good one without test-build.
  
   Maybe I'm just not understanding the work flow here...
  
  Sean, please install quilt and try using it for working with the system.
  Adding new patch is usually done in this way
  quilt new patch
  quilt add files
  edit
  quilt refresh
  
  cp patches/patch kernel_patches/fixes/
  git add kernel_patches/fixes/patch
  git commit kernel_patches/fixes/patch
 
 NOTE: The key to the above process is the assumption that the developer
 maintains _all_ of the existing patches from kernel_patches/ on top of
 the ofed_1_2 tree using quilt or stg.  Otherwise quilt/stg isn't buying
 you anything.

OFED will do this automatically.

 And this doesn't take into account backports.

The process works with backport patches too: you just have to do this

 quilt pop -a
 
   quilt new patch
   quilt add files
   edit
   quilt refresh
 
 quilt push -a



-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
  This is just my $0.01...
 
 It buys very little, if anything.  In fact, a whole $0.02 also buys  
 very little, if anything.  So take my comments for what they're worth.

Oh, good, I thought deflation is getting out of hand ...

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
  Lot's of stuff *is* in wiki already - did you look at pages Vlad  
  created?
 
 A search for quilt on the wiki turns up nothing (I checked before I  
 posted :-) ).
 
 And yes, I have [thoroughly] read the pages Vlad created.  But the  
 very fact that this conversation is occurring is because either the  
 information is not on the wiki or what is on the wiki is not clear.   
 Otherwise, I suspect that you simply would have pointed Steve to the  
 wiki and said Please read the fine manual at http://;.

You are right in that, I don't disclaim it.
Thanks for the suggestion, I'll try to find the time to add this to wiki.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [PATCH] for OFED 1.2
 

Sean, please install quilt and try using it for working with the system.
Adding new patch is usually done in this way
quilt new patch
quilt add files
edit
quilt refresh

cp patches/patch kernel_patches/fixes/
git add kernel_patches/fixes/patch
git commit kernel_patches/fixes/patch
   
   NOTE: The key to the above process is the assumption that the developer
   maintains _all_ of the existing patches from kernel_patches/ on top of
   the ofed_1_2 tree using quilt or stg.  Otherwise quilt/stg isn't buying
   you anything.
  
  OFED will do this automatically.
  
 
 uh, can you explain this?  Given I have a freshly cloned ofed_1_2 git
 tree, and I want to change cma.c (a good one cuz there are patches).
 What do I do?  There's no quilt stack at all at this point.  Right?  

Try running the configure script.
After this, quilt applied will show what patches are applied.

   And this doesn't take into account backports.
  
  The process works with backport patches too: you just have to do this
  
   quilt pop -a
   
 quilt new patch
 quilt add files
 edit
 quilt refresh
   
   quilt push -a
 
 
 But you cannot keep a stack for more than one backport pushed, right?
 So you still need to be slapping the stacks of patches around for each
 backport.  
 
 Or maybe I'm confused?

Yes.

Fortunately it's not too hard: you can do
quilt pop -a
and re-run configure for another kernel.

Of course for testing the patch, it is easier to commit the change in your tree
and then to use openfabrics cross-build functionality that will clone this
tree and build for multiple arches/kernels.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: RE: [PATCH] for OFED 1.2
 
 But you cannot keep a stack for more than one backport pushed, right?
 So you still need to be slapping the stacks of patches around for each
 backport.
 
 Why not have separate branches for each kernels too?

I think it'll be much more work to maintain all these branches.
And again, there will be conflicts, and it's too easy to get confused when
resolving a conflict.

With patches we have scripts to automate this.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for OFED 1.2

2007-02-27 Thread Michael S. Tsirkin
 Quoting Sasha Khapyorsky [EMAIL PROTECTED]:
 Subject: Re: [PATCH] for OFED 1.2
 
 On 19:44 Tue 27 Feb , Michael S. Tsirkin wrote:
   Quoting Sean Hefty [EMAIL PROTECTED]:
   Subject: Re: [PATCH] for OFED 1.2
   
   I think with stacked git or just git and rebasing at key times, you
   could keep an ofed_1_2 tree that folks can easily apply patches to...
   
   Its too late to change this for 1.2, but you might want to reconsider
   the design for 1.3.
   
   Can't we just create a new branch (ofed_1_2_patched) with these patches 
   already
   applied and in the correct order?  
  
  Then what we do when we want to update to new upstream? Throw this branch 
  away?
  As it is, I just pull then build and remove patches that conflict.
 
 You can save this branch as branch-name-upstream-name (or better)
 and to rebase branch-name to the new upstream.

rebase does not seem to be too robust when run on such a large repository as the
linux kernel.  Maybe stacked git will work.

  By the way, there are backport patches, etc - it is still incorrect
  to say that you would be able to generate a patch out of git
  and know it's a good one without test-build.
 
 In similar way you can track backport patch sets as branches.

At the moment it seems like a lot of work. Again, maybe stg makes it easy,
I know it's hard with plain git.

And I think lots of people (including me) will be confused if we have a ton of 
branches.

   Maybe I'm just not understanding the work flow here...
  
  Sean, please install quilt and try using it for working with the system.
  Adding new patch is usually done in this way
  quilt new patch
  quilt add files
  edit
  quilt refresh
  
  cp patches/patch kernel_patches/fixes/
  git add kernel_patches/fixes/patch
  git commit kernel_patches/fixes/patch
 
 This looks strange for me to track patches against patches...

One gets used to it :)
Seriously, we have these patches, and we want to version them together
with source they are intended to apply to.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Fwd: [ANNOUNCE] GIT 1.5.0.2

2007-02-27 Thread Michael S. Tsirkin
FYI.

- Forwarded message from Junio C Hamano [EMAIL PROTECTED] -

Subject: [ANNOUNCE] GIT 1.5.0.2
Date: Tue, 27 Feb 2007 10:58:22 +0200
In-Reply-To: [EMAIL PROTECTED] (Junio C. Hamano'smessage of Sun, 18 Feb 2007 
18:07:42 -0800)
References: [EMAIL PROTECTED]
From: Junio C Hamano [EMAIL PROTECTED]

The latest maintenance release GIT 1.5.0.2 is available at the
usual places:

  http://www.kernel.org/pub/software/scm/git/

  git-1.5.0.2.tar.{gz,bz2}  (tarball)
  git-htmldocs-1.5.0.2.tar.{gz,bz2} (preformatted docs)
  git-manpages-1.5.0.2.tar.{gz,bz2} (preformatted docs)
  RPMS/$arch/git-*-1.5.0.2-1.$arch.rpm  (RPM)


GIT v1.5.0.2 Release Notes
==

Fixes since v1.5.0.1


* Bugfixes

  - Automated merge conflict handling when changes to symbolic
links conflicted were completely broken.  The merge-resolve
strategy created a regular file with conflict markers in it
in place of the symbolic link.  The default strategy,
merge-recursive was even more broken.  It removed the path
that was pointed at by the symbolic link.  Both of these
problems have been fixed.

  - 'git diff maint master next' did not correctly give combined
diff across three trees.

  - 'git fast-import' portability fix for Solaris.

  - 'git show-ref --verify' without arguments did not error out
but segfaulted.

  - 'git diff :tracked-file `pwd`/an-untracked-file' gave an extra
slashes after a/ and b/.

  - 'git format-patch' produced too long filenames if the commit
message had too long line at the beginning.

  - Running 'make all' and then without changing anything
running 'make install' still rebuilt some files.  This
was inconvenient when building as yourself and then
installing as root (especially problematic when the source
directory is on NFS and root is mapped to nobody).

  - 'git-rerere' failed to deal with two unconflicted paths that
sorted next to each other.

  - 'git-rerere' attempted to open(2) a symlink and failed if
there was a conflict.  Since a conflicting change to a
symlink would not benefit from rerere anyway, the command
now ignores conflicting changes to symlinks.

  - 'git-repack' did not like to pass more than 64 arguments
internally to underlying 'rev-list' logic, which made it
impossible to repack after accumulating many (small) packs
in the repository.

  - 'git-diff' to review the combined diff during a conflicted
merge were not reading the working tree version correctly
when changes to a symbolic link conflicted.  It should have
read the data using readlink(2) but read from the regular
file the symbolic link pointed at.

  - 'git-remote' did not like period in a remote's name.

* Documentation updates

  - added and clarified core.bare, core.legacyheaders configurations.

  - updated git-clone --depth documentation.


* Assorted git-gui fixes.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

- End forwarded message -

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-27 Thread Michael S. Tsirkin
 Quoting Shirley Ma [EMAIL PROTECTED]:
 Subject: Re: [openib-general] IPOIB NAPI
 
 Roland Dreier [EMAIL PROTECTED] wrote on 02/27/2007 02:41:44 PM:
 
So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I 
  can
generate the patch for all ULPs to use this for review. Do you need me to
do that?
  
  No, it's not in OFED 1.2 or the upstream kernel.  And no one has
  implemented it for userspace (and I'm somewhat reluctant to break the
  ABI at this point without some performance numbers to motivate making
  this API change).
  
  Have the NAPI performance problems with ehca been resolved?  We could
  probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
  kernel changes at least.
  
   - R.
 We have addressed the NAPI performance issues with ehca driver. I believe the 
 patches have been upper stream. However the test results show that it's 
 better to delay poll again to next NAPI interval, something like this:
 
 poll-cq
 notify-cq, if missed_event  netif_rx_reschedule()
 return 1
 
 vs.
 poll-cq,
 notify-cq, if missed_event  netif_rx_reschedule()
 poll again
 return 0
 
 It seems ehca delivering packet much faster than other HCAs. So poll again 
 would stay in the loop for many many times. So the above changes doesn't 
 impact other HCAs, I would recommand it. I saw same implementations on other 
 ethernet drivers.

I'm confused. Which one is faster?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Bug 400] OFED 1.2 alpha1 IPoIB HA failover gets QP warnings

2007-02-27 Thread Michael S. Tsirkin
 ib1: dev_queue_xmit failed to requeue packet
 ib_mthca :04:00.0: QP 000405 not found in MGM
 ib1: ib_detach_mcast failed (result = -22)
 ib1: ipoib_mcast_detach failed (result = -22)

Looks like this is related to the multicast change that recently went upstream.
So this likely affects upstream IPoIB as well.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[OFA General] Re: IPOIB NAPI

2007-02-27 Thread Michael S. Tsirkin
 Quoting Shirley Ma [EMAIL PROTECTED]:
 Subject: Re: IPOIB NAPI
 
 oland Dreier [EMAIL PROTECTED] wrote on 02/27/2007 02:41:44 PM:
 
So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I
 can
generate the patch for all ULPs to use this for review. Do you need me to
do that?
 
  No, it's not in OFED 1.2 or the upstream kernel.  And no one has
  implemented it for userspace (and I'm somewhat reluctant to break the
  ABI at this point without some performance numbers to motivate making
  this API change).
 
  Have the NAPI performance problems with ehca been resolved?  We could
  probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
  kernel changes at least.
 
   - R.
 We have addressed the NAPI performance issues with ehca driver. I believe the
 patches have been upper stream. However the test results show that it's better
 to delay poll again to next NAPI interval, something like this:
 
 poll-cq
 notify-cq, if missed_event  netif_rx_reschedule()
 return 1
 
 vs.
 poll-cq,
 notify-cq, if missed_event  netif_rx_reschedule()
 poll again
 return 0
 
 It seems ehca delivering packet much faster than other HCAs. So poll again
 would stay in the loop for many many times. So the above changes doesn't 
 impact
 other HCAs, I would recommand it. I saw same implementations on other ethernet
 drivers.

I have not benchmarked this, but actually the return 1 version makes sense to
me too: since a new completion was observed after notify-cq, we likely currently
have HCA writing new completions into the CQ at a high rate, so it makes sense
to delay polling by a few cycles, and reduce the number of interrupts in this
way.

Right?

-- 
MST

___
general mailing list
[EMAIL PROTECTED]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [OFA General] List Address Change Completed

2007-02-27 Thread Michael S. Tsirkin
 Quoting Lee, Michael Paichi [EMAIL PROTECTED]:
 Subject: [OFA General] List Address Change Completed
 
 This list has been migrated to the new server, lists.openfabrics.org.  Please 
 update any address book or filter settings to reflect the new mailing list 
 address.  Future messages and replies should be sent to this address:
 
 [EMAIL PROTECTED]
 
 The new web address for this list is:
 
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 If you have any questions, please contact me at [EMAIL PROTECTED]   

Can the subject prefix be made all lower-case, with dash, please?
OFA General - ofa-general?

Upper case words look like shouting to me, and e.g. exchange rules
are limited in coping with spaces.

-- 
MST
___
general mailing list
[EMAIL PROTECTED]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB

2007-02-26 Thread Michael S. Tsirkin
 
 During my tests I found that when running 
 
   1. modprobe -r ib_mthca (to delete IPoIB interfaces)
   2. ping somewhere on the subnet of bond0
 
 I get this stack dump (which ends with kernel death)
[8037ff32] skb_under_panic+0x5c/0x60
[882e00c2] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
[803c3c98] arp_create+0x120/0x226
[803c3dc3] arp_send+0x25/0x3b
[803c466a] arp_solicit+0x186/0x195
[8038c0ac] neigh_timer_handler+0x2b5/0x309
[8038bdf7] neigh_timer_handler+0x0/0x309
[80239599] run_timer_softirq+0x130/0x19e
[80235fcc] __do_softirq+0x55/0xc3
[8020acac] call_softirq+0x1c/0x28
[8020c02b] do_softirq+0x2c/0x7d
[8021864a] smp_apic_timer_interrupt+0x57/0x6a
[80208e19] mwait_idle+0x0/0x45
[8020a756] apic_timer_interrupt+0x66/0x70
EOI  [80208e5b] mwait_idle+0x42/0x45
[80208db1] cpu_idle+0x8b/0xae
[80217d60] start_secondary+0x47f/0x48f
 
 The only way I found to avoid this (for now) is to check skb headroom in
 ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
 operation and it seems to solve my problem. However, I would be happy to hear 
 what
 others think of this last issue.

This seems to mean that hard_header_len is not copied from slave to master
device. Right? Maybe that's what needs to be fixed.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB

2007-02-26 Thread Michael S. Tsirkin
 When using the bonding driver, neighbours are created by the net stack on 
 behalf
 of the bonding (master) device. On the tx flow the bonding code gets an skb 
 such
 that skb-dev points to the master device, it changes this skb to point on the
 slave device and calls the slave hard_start_xmit function.
 
 
 Combing these two flows, there is a hole if some code at ipoib
 (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, 
 n-dev
 is an ipoib device so for example netdev_priv(n-dev) would be of type struct
 ipoib_dev_priv.
 
 To fix it, this patch adds a dev field to struct ipoib_neigh which is used
 instead of the struct neighbour dev one.

It seems that in this design, if multiple ipoib interfaces are present, we might
get an skb such that skb-dev will be different from the new dev field in struct
ipoib_neigh.

It seems that the result will be that the packet will be sent on a wrong 
interface.
Right?

 In addition, if an IPoIB device is removed before bonding is unloaded it may 
 cause bond0 neighbours (neighbours that point to bond0) to exist after the 
 IPoIB
 device no longer exist. This is why a neighbour cleanup is required during 
 device 
 cleanup. This cleanup scans the arp cache and the ndisc cache to find there 
 neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib 
 module is
 unloaded, the neighbour destructor must be set to NULL because the neighbour 
 function is in
 ib_ipoib.
 For this neigh table cleanup, it is required to export the symbol nd_tbl just 
 like the symbol arp_tbl is.

I wonder about this: is it really true that any allocated neighbour is always in
either arp_tbl or nd_tbl? For example, could some code have called neigh_hold
and retained a neighbour that is not in either one of these tables?

 During my tests I found that when running 
 
   1. modprobe -r ib_mthca (to delete IPoIB interfaces)
   2. ping somewhere on the subnet of bond0
 
 I get this stack dump (which ends with kernel death)
[8037ff32] skb_under_panic+0x5c/0x60
[882e00c2] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
[803c3c98] arp_create+0x120/0x226
[803c3dc3] arp_send+0x25/0x3b
[803c466a] arp_solicit+0x186/0x195
[8038c0ac] neigh_timer_handler+0x2b5/0x309
[8038bdf7] neigh_timer_handler+0x0/0x309
[80239599] run_timer_softirq+0x130/0x19e
[80235fcc] __do_softirq+0x55/0xc3
[8020acac] call_softirq+0x1c/0x28
[8020c02b] do_softirq+0x2c/0x7d
[8021864a] smp_apic_timer_interrupt+0x57/0x6a
[80208e19] mwait_idle+0x0/0x45
[8020a756] apic_timer_interrupt+0x66/0x70
EOI  [80208e5b] mwait_idle+0x42/0x45
[80208db1] cpu_idle+0x8b/0xae
[80217d60] start_secondary+0x47f/0x48f
 
 The only way I found to avoid this (for now) is to check skb headroom in
 ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
 operation and it seems to solve my problem. However, I would be happy to hear 
 what
 others think of this last issue.

As I said, this seems to indicate a problem in the bonding code.
But what will happen after you error out in ipoib_hard_header?
Is the packet dropped? What might break as a result?

 I would really appreciate comments.
 
 thanks
 
  -MoniS

--
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 07deee8..31bc6d8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -216,6 +216,7 @@ struct ipoib_neigh {
struct sk_buff_head queue;
 
struct neighbour   *neighbour;
+   struct net_device *dev;
 
struct list_headlist;
 };
@@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip
 INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+ struct net_device *dev);
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 705eb1d..0e3953e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -48,6 +48,8 @@ #include linux/ip.h
 #include linux/in.h
 
 #include net/dst.h
+#include net/arp.h
+#include net/ndisc.h
 
 #define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha)  0xff)
 
@@ -70,6 +72,7 @@ module_param_named(debug_level, ipoib_de
 MODULE_PARM_DESC(debug_level, Enable debug tracing if  0);
 #endif
 
+static int ipoib_at_exit = 0;
 struct ipoib_path_iter {
struct net_device 

Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-25 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small 
 message bandwidth
 
 OK, I applied the following patch (I had to change one line of your
 patch to get it to apply because the small-message changed the context
 so one chunk didn't apply).
 
 Anyway I don't see any difference in small message latency or large
 message throughput.  (Actually latency seems slightly worse but I
 think the change is within my normal variability so I'm don't think
 the difference is significant)

OK.
I wonder whether unrolling the loop in skb_put_frags might be helpful.
Could you please try the following? Does this affect latency for you?
(I don't see any difference in between UD and CM either with or without
 this patch).


Try to improve small message latency some more by unrolling more loops.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index a389854..a8895b4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -311,38 +311,6 @@ static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id,
return 0;
}
 }
-/* Adjust length of skb with fragments to match received data */
-static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
- unsigned int length, struct sk_buff *toskb)
-{
-   int i, num_frags;
-   unsigned int size;
-
-   /* put header into skb */
-   size = min(length, hdr_space);
-   skb-tail += size;
-   skb-len += size;
-   length -= size;
-
-   num_frags = skb_shinfo(skb)-nr_frags;
-   for (i = 0; i  num_frags; i++) {
-   skb_frag_t *frag = skb_shinfo(skb)-frags[i];
-
-   if (length == 0) {
-   /* don't need this page */
-   skb_fill_page_desc(toskb, i, frag-page, 0, PAGE_SIZE);
-   --skb_shinfo(skb)-nr_frags;
-   } else {
-   size = min(length, (unsigned) PAGE_SIZE);
-
-   frag-size = size;
-   skb-data_len += size;
-   skb-truesize += size;
-   skb-len += size;
-   length -= size;
-   }
-   }
-}
 
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
@@ -352,7 +320,7 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct 
ib_wc *wc)
struct ipoib_cm_rx *p;
unsigned long flags;
u64 mapping[IPOIB_CM_RX_SG];
-   int frags;
+   unsigned head_size, frag_size, frags;
 
ipoib_dbg_data(priv, cm recv completion: id %d, op %d, status: %d\n,
   wr_id, wc-opcode, wc-status);
@@ -388,8 +356,9 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct 
ib_wc *wc)
}
}
 
-   frags = PAGE_ALIGN(wc-byte_len - min(wc-byte_len,
- (unsigned)IPOIB_CM_HEAD_SIZE)) / 
PAGE_SIZE;
+   head_size = min(wc-byte_len, (unsigned)IPOIB_CM_HEAD_SIZE);
+   frag_size = wc-byte_len - head_size;
+   frags = PAGE_ALIGN(frag_size) / PAGE_SIZE;
 
newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, mapping);
if (unlikely(!newskb)) {
@@ -408,7 +377,18 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct 
ib_wc *wc)
ipoib_dbg_data(priv, received %d bytes, SLID 0x%04x\n,
   wc-byte_len, wc-slid);
 
-   skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc-byte_len, newskb);
+   memcpy(skb_shinfo(newskb)-frags[frags], 
skb_shinfo(skb)-frags[frags],
+  (IPOIB_CM_RX_SG - 1 - frags) * sizeof(skb_frag_t));
+   skb_shinfo(newskb)-nr_frags = IPOIB_CM_RX_SG - 1;
+
+   skb_shinfo(skb)-nr_frags = frags;
+   skb-tail += head_size;
+   skb-len += wc-byte_len;
+   skb-data_len += frag_size;
+   skb-truesize += frag_size;
+   if (frags)
+   skb_shinfo(skb)-frags[frags - 1].size =
+   (frag_size - 1) % PAGE_SIZE + 1;
 
skb-protocol = ((struct ipoib_header *) skb-data)-proto;
skb-mac.raw = skb-data;





-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] new OFED 1.2 package

2007-02-25 Thread Michael S. Tsirkin
 Quoting r. Woodruff, Robert J [EMAIL PROTECTED]:
 Subject: Re: [openfabrics-ewg] new OFED 1.2 package
 
 I am also still seeing the issue with the rdma_cm abi_version on RedHat
 EL4-U3,
 bug number, 347. The bug report contains the patch that should fix this.

OK, here's a somewhat cleaned-up patch.  However, I have a question: should not 
the
module cleanup function remove ucma_class and class_attr_abi_version that were
created at module initialization?



diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index e2e8d32..e9e024e 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -847,13 +847,12 @@ static struct miscdevice ucma_misc = {
.fops   = ucma_fops,
 };
 
-static ssize_t show_abi_version(struct device *dev,
-   struct device_attribute *attr,
-   char *buf)
+static struct class *ucma_class;
+static ssize_t show_abi_version(struct class *class_dev, char *buf)
 {
return sprintf(buf, %d\n, RDMA_USER_CM_ABI_VERSION);
 }
-static DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL);
+static CLASS_ATTR(abi_version, S_IRUGO, show_abi_version, NULL);
 
 static int __init ucma_init(void)
 {
@@ -863,7 +862,13 @@ static int __init ucma_init(void)
if (ret)
return ret;
 
-   ret = device_create_file(ucma_misc.this_device, dev_attr_abi_version);
+   ucma_class = class_create(THIS_MODULE, infiniband_ucma);
+   if (IS_ERR(ucma_class)) {
+   printk(KERN_ERR rdma_ucm: couldn't create class 
infiniband_ucma\n);
+   goto err;
+   }
+
+   ret = class_create_file(ucma_class, class_attr_abi_version);
if (ret) {
printk(KERN_ERR rdma_ucm: couldn't create abi_version attr\n);
goto err;
@@ -876,7 +881,6 @@ err:
 
 static void __exit ucma_cleanup(void)
 {
-   device_remove_file(ucma_misc.this_device, dev_attr_abi_version);
misc_deregister(ucma_misc);
idr_destroy(ctx_idr);
 }


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1

2007-02-25 Thread Michael S. Tsirkin
 Quoting Scott Weitzenkamp (sweitzen) [EMAIL PROTECTED]:
 Subject: bugs filed for problems compiling OFED 1.2 alpha1
 
 Please fix these bugs for beta. 
  

Scott, you have assigned all bugs to [EMAIL PROTECTED]
To have the bugs resolved, please assign them to maintainers of
appropriate module.

For example, bonding module owner is Moni Shoua [EMAIL PROTECTED],
so I think bug 384 should be assigned to him.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1

2007-02-25 Thread Michael S. Tsirkin
 Quoting Scott Weitzenkamp (sweitzen) [EMAIL PROTECTED]:
 Subject: bugs filed for problems compiling OFED 1.2 alpha1
 
 Please fix these bugs for beta. 
  
 I've compiled for RHEL4 and SLES10 on x86_64, i686, ia64, and ppc64.  I
 compiled all MPIs with GNU, Intel, and PGI compilers.
 
   • 380 OFED 1.2 alpha1 gcc MVAPICH won't compile on RHEL4 IA64
   • 381 OFED 1.2 alpha1 MVAPICH2 won't compile on RHEL4 IA64 with Intel
 compiler
   • 382 OFED 1.2 alpha1 mpitests won't compile with Intel compiler for 
 Open
 MPI (RHEL4 IA64)
   • 383 OFED 1.2 alpha1 core/addr.c won't compile on SLES10 IA64
   • 384 OFED 1.2 alpha1 ib-bonding won't compile on RHEL4 U3 ppc64
   • 386 OFED 1.2 alpha1 gcc MVAPICH2 won't compile on RHEL4 ppc64 (add
 -m64)
   • 387 OFED 1.2 alpha1 Open MPI won't compile on SLES10 ppc64

Some of these might be fixed in recent nightly builds.
Specifically I know 383 was fixed yesterday. Please check this and let us know.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1

2007-02-25 Thread Michael S. Tsirkin
 Quoting Scott Weitzenkamp (sweitzen) [EMAIL PROTECTED]:
 Subject: RE: bugs filed for problems compiling OFED 1.2 alpha1
 
 
  Scott, you have assigned all bugs to [EMAIL PROTECTED]
  To have the bugs resolved, please assign them to maintainers of
  appropriate module.
 
 Not sure what you mean by all, only 384 was not assigned to a specific
 person.

Correct. Sorry about that.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter

2007-02-24 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH] libmthca: optimize calls to htonl with constant parameter
 
   Newer gccs have the -fwhole-program --combine options that address
   this and more. One of the things that happens is that all internal
   functions are made 'static' and all compilation units are optimized in
   one go.
 
 Good point... but is there any sane way to use that feature with
 automake and libtool?  I know that the autotools are a pain but I
 really don't want to reimplement the useful stuff they give us, and I
 don't know of any really practical replacement...

Once KDE4 is out, I expect that most systems will start shipping cmake.
Maybe it'll be practical to switch to that then.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter

2007-02-23 Thread Michael S. Tsirkin
 So what is different in your setup that causes this patch to make a
 difference for you?

Hmm. I agree it is somewhat strange.

Below is a simple test that attempts to compare htonl, CONSTANT_HTONL,
and an array-driven implementation. The code line is taken directly from htonl.
Could you compile and run it please?

I see:

$ gcc -O2 1.c
$ ./a.out
test1
122396.00 usec
test2
10517799.00 usec
test3
104099.00 usec

which seems to imply CONSTANT_HTONL is much faster.

Ideas?

---


#include stdio.h
#include sys/time.h
#include time.h

#include endian.h

#define SIZE 255

enum ibv_send_flags {
IBV_SEND_FENCE  = 1  0,
IBV_SEND_SIGNALED   = 1  1,
IBV_SEND_SOLICITED  = 1  2,
IBV_SEND_INLINE = 1  3
};

enum {
MTHCA_NEXT_DBD   = 1  7,
MTHCA_NEXT_FENCE = 1  6,
MTHCA_NEXT_CQ_UPDATE = 1  3,
MTHCA_NEXT_EVENT_GEN = 1  2,
MTHCA_NEXT_SOLICIT   = 1  1,
};

int ar[SIZE];


void init_ar()
{
ar[0]=htonl(1);
ar[IBV_SEND_SIGNALED]=htonl(MTHCA_NEXT_CQ_UPDATE|1);;
ar[IBV_SEND_SOLICITED]=htonl(MTHCA_NEXT_SOLICIT|1);;

ar[IBV_SEND_SIGNALED|IBV_SEND_SOLICITED]=htonl(MTHCA_NEXT_CQ_UPDATE|MTHCA_NEXT_SOLICIT|1);;
}


int test1(int x) 
{
return ar[x  (IBV_SEND_SIGNALED | IBV_SEND_SOLICITED)];
}


int test2(int x) 
{
return 
((x  IBV_SEND_SIGNALED)  ? htonl(MTHCA_NEXT_CQ_UPDATE) : 0) |
((x  IBV_SEND_SOLICITED)  ? htonl(MTHCA_NEXT_SOLICIT) : 0) |
htonl(1);
}

#if __BYTE_ORDER == __LITTLE_ENDIAN
#define CONSTANT_HTONL(x) \
((x  24) | ((x  8)  0xff00) | ((x  8)  0xff) | (x  24))
#elif __BYTE_ORDER == __BIG_ENDIAN
#define CONSTANT_HTONL(x) (x)
#else
#define CONSTANT_HTONL(x) htonl(x)
#endif

int test3(int x) 
{
return 
((x  IBV_SEND_SIGNALED)  ? 
CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE) : 0) |
((x  IBV_SEND_SOLICITED) ? CONSTANT_HTONL(MTHCA_NEXT_SOLICIT) 
: 0) |
CONSTANT_HTONL(1);
}

struct timeval   start, end;

void timestart(void)
{
if (gettimeofday(start, NULL)) {
perror(gettimeofday);
return;
}

}


void timeend(void)
{
if (gettimeofday(end, NULL)) {
perror(gettimeofday);
return;
}

{
float usec = (end.tv_sec - start.tv_sec) * 100 +
(end.tv_usec - start.tv_usec);

printf(%.2f usec\n, usec);
}

}

main() 
{
int i;

init_ar();

printf(test1\n);

timestart();

for (i=0; i1; ++i) {
(void) test1(IBV_SEND_SIGNALED);
(void) test1(0);
(void) test1(IBV_SEND_SIGNALED | IBV_SEND_SOLICITED);
(void) test1(IBV_SEND_SOLICITED);
}
timeend();

printf(test2\n);
timestart();

for (i=0; i1; ++i) {
(void) test2(IBV_SEND_SIGNALED);
(void) test2(0);
(void) test2(IBV_SEND_SIGNALED | IBV_SEND_SOLICITED);
(void) test2(IBV_SEND_SOLICITED);
}
timeend();
printf(test3\n);
timestart();

for (i=0; i1; ++i) {
(void) test3(IBV_SEND_SIGNALED);
(void) test3(0);
(void) test3(IBV_SEND_SIGNALED | IBV_SEND_SOLICITED);
(void) test3(IBV_SEND_SOLICITED);
}
timeend();
}

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter

2007-02-23 Thread Michael S. Tsirkin
 Quoting Michael S. Tsirkin [EMAIL PROTECTED]:
 Subject: Re: [PATCH] libmthca: optimize calls to htonl with constant parameter
 
  So what is different in your setup that causes this patch to make a
  difference for you?
 
 Hmm. I agree it is somewhat strange.
 
 Below is a simple test that attempts to compare htonl, CONSTANT_HTONL,
 and an array-driven implementation. The code line is taken directly from 
 htonl.
 Could you compile and run it please?

OK, this was stupid, the test was missing 
#include netinet/in.h
so htonl was expanded by a gcc intrinsic which seems to work worse
than the macro tricks present in netinet/in.h.

I guess this include got killed on the test system somehow,
and this explains why I saw a difference in libmthca.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64

2007-02-22 Thread Michael S. Tsirkin
 Quoting Scott Weitzenkamp (sweitzen) [EMAIL PROTECTED]:
 Subject: anyone have OFED 1.2 alpha1 compiling on ppc64
 
 I tried both RHEL4 and SLES10 usinstall.sh, and get this.  I filed bug 379,
 anyone else tried ppc64?

Scott, could pls you upload the kernel sources and .config files to staging?
If you do, we'll be able to add these to mightly cross-build environment.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] libibverbs: can't compile more than once due to man3 symbolic links

2007-02-22 Thread Michael S. Tsirkin
 Quoting Jack Morgenstein [EMAIL PROTECTED]:
 Subject: libibverbs: can't compile more than once due to man3 symbolic links
 
 The code below was just added to libibverbs/Makefile.am
 
 install-data-hook:
 cd $(DESTDIR)$(mandir)/man3  \
 $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3  \
   
 
 This creates a problem when re-compiling/re-installing libibverbs --
 the ln -s ( = $(LN_S) ) fails because the symbolic links still exist
 in the man/man3 directory.
 
 I rummaged around the libtool documentation, and there is no pre-defined
 macro which does ln -fs (which would just overwrite the current links).
 
 Any ideas on how to fix this problem cleanly (i.e., without violating spirit
 of libtool/automake)?

Probably just add
$(RM) ibv_ack_async_event.3  \

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] librdmacm examples not working under OFED 1.2 alpha

2007-02-22 Thread Michael S. Tsirkin
 
   librdmacm: couldn't read ABI version.
   librdmacm: assuming: 4

I think there was a kernel patch from Woody to address this.
Woody?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-22 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
 
 Thanks, queued for 2.6.21.  With this patch I see small-packet latency
 down almost all the way back to what datagram mode gives -- on a pair
 of fast woodcrest systems I see latencies for netpipe tcp 1 byte
 messages like
 
   datagram 13.xx
   original CM  17.xx
   patched CM   14.xx
 
 so there is still a measurable difference but it is much less now.

Hmm. An old system I tried here has a much higher latency, but
does not seem to exhibit latency difference between datagram and CM.

1. Is there something special you do when you run the benchmark (msi, taskset, 
...)?
2. On a wild guess that the issue here is higher interrupt rate with CM,
   is there a chance you could test the following patch posted by me earlier?
   http://www.mail-archive.com/openib-general@openib.org/msg29290.html

Thanks,

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64

2007-02-22 Thread Michael S. Tsirkin
Don't you have an account at ssh.openfabrics.org?
If yes, just put kernel sources and the .config under your home directory

Quoting r. Scott Weitzenkamp (sweitzen) [EMAIL PROTECTED]:
Subject: Re: anyone have OFED 1.2 alpha1 compiling on ppc64

How do I upload sources?

 -Original Message-
 From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, February 22, 2007 1:00 AM
 To: Scott Weitzenkamp (sweitzen)
 Cc: [EMAIL PROTECTED]; OPENIB
 Subject: Re: anyone have OFED 1.2 alpha1 compiling on ppc64
 
  Quoting Scott Weitzenkamp (sweitzen) [EMAIL PROTECTED]:
  Subject: anyone have OFED 1.2 alpha1 compiling on ppc64
  
  I tried both RHEL4 and SLES10 usinstall.sh, and get this.  
 I filed bug 379,
  anyone else tried ppc64?
 
 Scott, could pls you upload the kernel sources and .config 
 files to staging?
 If you do, we'll be able to add these to mightly cross-build 
 environment.
   
 -- 
 MST
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-22 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
 
 OK, I applied the following patch (I had to change one line of your
 patch to get it to apply because the small-message changed the context
 so one chunk didn't apply).
 
 Anyway I don't see any difference in small message latency or large
 message throughput.  (Actually latency seems slightly worse but I
 think the change is within my normal variability so I'm don't think
 the difference is significant)

OK, thanks for testing this.
I need to spend more time on reproducing this issue, and profiling.
I'll add this to my todo list.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64

2007-02-22 Thread Michael S. Tsirkin
 Quoting Scott Weitzenkamp (sweitzen) [EMAIL PROTECTED]:
 Subject: RE: anyone have OFED 1.2 alpha1 compiling on ppc64
 
  Don't you have an account at ssh.openfabrics.org?
 
 Can an admin please give me an account?

I'm not an admin but I think you want to post your ssh
public key.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-22 Thread Michael S. Tsirkin
   An API idea:
   how about instead testing missed_events, we add a flag:
   
   IB_CQ_TEST (or a longer name IB_CQ_REPORT_MISSED_EVENTS?)
   and change ib_req_notify_cq to return int which will keep
   the missed_events value, only if this flag is set?
   
   This has 2 advatages
   - Less churn updating all users to new API - they just ignore return value 
 -
 and still almost no overhead for them as they don't set IB_CQ_TEST
   - For all users we have to push less values on stack - note compiler can't
 get rid of them as we are calling function through a pointer
   - For users that do
 missed_events = ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP | IB_CQ_TEST)
 we get the result in register.
 
 Yes, I like this.  So ib_req_notify_cq() gets a return value that is
 negative if an error occurred, 0 if everything is fine, or positive if
 a missed event might have happened.
 
 I think I prefer the longer name IB_CQ_REPORT_MISSED_EVENTS -- at
 least there's a chance at guessing what it means even if you don't
 read the documentation.

By the way, how about extending the userspace API in a similiar
fashion?

missed_events = ibv_req_notify_cq(priv-cq, IBV_CQ_NEXT_COMP |
  IBV_CQ_REPORT_MISSED_EVENTS)


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-22 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: IPOIB NAPI
 
   By the way, how about extending the userspace API in a similiar
   fashion?
   
   missed_events = ibv_req_notify_cq(priv-cq, IBV_CQ_NEXT_COMP |
IBV_CQ_REPORT_MISSED_EVENTS)
 
 It would require a kernel-user ABI bump. Is it worth it?

I hear some people asking for it: I imagine reasons are same as NAPI -
race-free, clean API to switch from polling to event mode -
rather than a minor optimization.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter

2007-02-22 Thread Michael S. Tsirkin
GCC seems to be unable to propogate constants across calls to htonl.
So it turns out to be worth the while to replace htonl with
a hand-written macro in case of constant parameter.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]
Signed-off-by: Ishai Rabinovitz [EMAIL PROTECTED]

---

Roland, I'm looking at micro-optimizing libmthca/mthca some more.
The following optimization is minor, but it seems quite safe.
What do you think? Tested with gcc 4.0.3.

diff --git a/src/cq.c b/src/cq.c
index 0aeb7a9..9428f74 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -275,7 +275,7 @@ static int handle_error_cqe(struct mthca_cq *cq,
 * doorbell count field.  In that case we always free the CQE.
 */
if (mthca_is_memfree(cq-ibv_cq.context) ||
-   !(new_wqe  htonl(0x3f)) || (!cqe-db_cnt  dbd))
+   !(new_wqe  CONSTANT_HTONL(0x3f)) || (!cqe-db_cnt  dbd))
return 0;
 
cqe-db_cnt   = htons(ntohs(cqe-db_cnt) - dbd);
diff --git a/src/mthca.h b/src/mthca.h
index 1f31bc3..798029f 100644
--- a/src/mthca.h
+++ b/src/mthca.h
@@ -112,6 +112,20 @@ enum {
MTHCA_OPCODE_INVALID= 0xff
 };
 
+/* GCC does not seem to be able to do constant propogation
+ * across htonl/ntohl calls */
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define CONSTANT_HTONL(x)\
+   unsigned)x)  24) | \
+unsigned)x)  8)  0xff00)   | \
+unsigned)x)  8)  0xff) | \
+(((unsigned)x)  24))
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define CONSTANT_HTONL(x) (x)
+#else
+#define CONSTANT_HTONL(x) htonl(x)
+#endif
+
 struct mthca_ah_page;
 
 struct mthca_device {
diff --git a/src/qp.c b/src/qp.c
index f2483e9..85d3385 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -138,10 +138,10 @@ int mthca_tavor_post_send(struct ibv_qp *ibqp, struct 
ibv_send_wr *wr,
((struct mthca_next_seg *) wqe)-ee_nds = 0;
((struct mthca_next_seg *) wqe)-flags =
((wr-send_flags  IBV_SEND_SIGNALED) ?
-htonl(MTHCA_NEXT_CQ_UPDATE) : 0) |
+CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE) : 0) |
((wr-send_flags  IBV_SEND_SOLICITED) ?
-htonl(MTHCA_NEXT_SOLICIT) : 0)   |
-   htonl(1);
+CONSTANT_HTONL(MTHCA_NEXT_SOLICIT) : 0)   |
+   CONSTANT_HTONL(1);
if (wr-opcode == IBV_WR_SEND_WITH_IMM ||
wr-opcode == IBV_WR_RDMA_WRITE_WITH_IMM)
((struct mthca_next_seg *) wqe)-imm = wr-imm_data;
@@ -359,9 +359,9 @@ int mthca_tavor_post_recv(struct ibv_qp *ibqp, struct 
ibv_recv_wr *wr,
 
((struct mthca_next_seg *) wqe)-nda_op = 0;
((struct mthca_next_seg *) wqe)-ee_nds =
-   htonl(MTHCA_NEXT_DBD);
+   CONSTANT_HTONL(MTHCA_NEXT_DBD);
((struct mthca_next_seg *) wqe)-flags =
-   htonl(MTHCA_NEXT_CQ_UPDATE);
+   CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE);
 
wqe += sizeof (struct mthca_next_seg);
size = sizeof (struct mthca_next_seg) / 16;
@@ -505,10 +505,10 @@ int mthca_arbel_post_send(struct ibv_qp *ibqp, struct 
ibv_send_wr *wr,
 
((struct mthca_next_seg *) wqe)-flags =
((wr-send_flags  IBV_SEND_SIGNALED) ?
-htonl(MTHCA_NEXT_CQ_UPDATE) : 0) |
+CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE) : 0) |
((wr-send_flags  IBV_SEND_SOLICITED) ?
-htonl(MTHCA_NEXT_SOLICIT) : 0)   |
-   htonl(1);
+CONSTANT_HTONL(MTHCA_NEXT_SOLICIT) : 0)   |
+   CONSTANT_HTONL(1);
if (wr-opcode == IBV_WR_SEND_WITH_IMM ||
wr-opcode == IBV_WR_RDMA_WRITE_WITH_IMM)
((struct mthca_next_seg *) wqe)-imm = wr-imm_data;
@@ -750,7 +750,7 @@ int mthca_arbel_post_recv(struct ibv_qp *ibqp, struct 
ibv_recv_wr *wr,
 
if (i  qp-rq.max_gs) {
((struct mthca_data_seg *) wqe)-byte_count = 0;
-   ((struct mthca_data_seg *) wqe)-lkey = 
htonl(MTHCA_INVAL_LKEY);
+   ((struct mthca_data_seg *) wqe)-lkey = 
CONSTANT_HTONL(MTHCA_INVAL_LKEY);
((struct mthca_data_seg *) wqe)-addr = 0;
}
 
@@ -872,7 +872,7 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap 
*cap,
for (scatter = (void *) (next + 1);
 (void *) scatter  (void *) next + (1  
qp-rq.wqe_shift);
 ++scatter)
-   scatter-lkey = htonl(MTHCA_INVAL_LKEY);
+   scatter-lkey = 
CONSTANT_HTONL(MTHCA_INVAL_LKEY

Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-21 Thread Michael S. Tsirkin
 Quoting r. Or Gerlitz [EMAIL PROTECTED]:
 Subject: Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small 
 message bandwidth
 
 Michael S. Tsirkin wrote:
  Avoid overhead of freeing/reallocating and mapping/unmapping for dma
  for pages that have not been written to by hardware.
 
  diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
  b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
  index 8ee6f06..a23c8e3 100644
  --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
  +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
  @@ -68,14 +68,14 @@ struct ipoib_cm_id {
   static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
 struct ib_cm_event *event);
   
  -static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv,
  +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
u64 mapping[IPOIB_CM_RX_SG])
   {
  int i;
   
  ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_CM_HEAD_SIZE, 
  DMA_FROM_DEVICE);
   
  -   for (i = 0; i  IPOIB_CM_RX_SG - 1; ++i)
  +   for (i = 0; i  frags; ++i)
  ib_dma_unmap_single(priv-ca, mapping[i + 1], PAGE_SIZE, 
  DMA_FROM_DEVICE);
   }
 
 I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on 
 IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags 
 times, correct?

No.

 does this means you are trashing the IOMMU etc etc of 
 the system?

I don't think so.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Fwd: Address List Change for Friday, 2/23/2007

2007-02-21 Thread Michael S. Tsirkin
Could an example message be please sent *today* to the new list,
so that client rules can be updated?

I can't access my inbox on Friday or Saturday, and this change
will cause problems and message loss for me unless I can prepare
beforehand.


Quoting Jeff Squyres [EMAIL PROTECTED]:
Subject: Fwd: Address List Change for Friday, 2/23/2007

FYI.  In case you missed it the first time: THIS LIST IS CHANGING ON  
FRIDAY 2/23/2007 (2 days from now).  Please update your addressbooks!

See the notice below for the details.



Begin forwarded message:

 From: Lee, Michael Paichi [EMAIL PROTECTED]
 Date: February 19, 2007 10:43:23 AM EST
 To: openib-general@openib.org
 Subject: [openib-general] Address List Change for Friday, 2/23/2007

 We're in the process of migrating the maillists from the old  
 openib.org server to the new lists.openfabrics.org machine.  The  
 list openib-general will be moved this Friday, February 23, 2007.   
 The new address for the maillist will be  
 [EMAIL PROTECTED]

 What this means is that messages will come from  
 [EMAIL PROTECTED]  Conversely, replies should be made  
 to this address as well.  Messages will also have a new subject  
 line prefix of [OFA General].  If you have configured your e-mail  
 client to filter based on maillist address or subject headers, you  
 may need to make some adjustments for filtering.

 However, for the sake of transition, messages sent to the previous  
 maillist address on the old server will forward to the new server.   
 This forward will remain in place until the old server is taken  
 offline and final DNS changes are made.  We expect the old server  
 to go offline sometime in early March.

 The web archives will also be migrated to the new web address  
 shortly, http://lists.openfabrics.org.

 If you have any questions, please don't hesitate to contact me at  
 [EMAIL PROTECTED]

 Regards,
 Michael Lee

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
 openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-20 Thread Michael S. Tsirkin
Avoid overhead of freeing/reallocating and mapping/unmapping for dma
for pages that have not been written to by hardware.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

This gives 10% boost in BW for message sizes up to 32K. Please queue for 
2.6.21.

before:

# ./netperf-2.4.2/src/netperf -f M -H 11.4.3.68 -c -C -- -m 32000
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68) 
port 0 AF_INET : demo
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.MBytes  /s  % S  % S  us/KB   us/KB

 87380  16384  3200010.00   716.23   26.2223.941.430   1.306


after:

# ./netperf-2.4.2/src/netperf -f M -H 11.4.3.68 -c -C -- -m 32000
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68) 
port 0 AF_INET : demo
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.MBytes  /s  % S  % S  us/KB   us/KB

 87380  16384  3200010.00   888.67   24.1325.081.061   1.102


diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 8ee6f06..a23c8e3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -68,14 +68,14 @@ struct ipoib_cm_id {
 static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
   struct ib_cm_event *event);
 
-static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv,
+static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
  u64 mapping[IPOIB_CM_RX_SG])
 {
int i;
 
ib_dma_unmap_single(priv-ca, mapping[0], IPOIB_CM_HEAD_SIZE, 
DMA_FROM_DEVICE);
 
-   for (i = 0; i  IPOIB_CM_RX_SG - 1; ++i)
+   for (i = 0; i  frags; ++i)
ib_dma_unmap_single(priv-ca, mapping[i + 1], PAGE_SIZE, 
DMA_FROM_DEVICE);
 }
 
@@ -93,7 +93,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, int 
id)
ret = ib_post_srq_recv(priv-cm.srq, priv-cm.rx_wr, bad_wr);
if (unlikely(ret)) {
ipoib_warn(priv, post srq failed for buf %d (%d)\n, id, ret);
-   ipoib_cm_dma_unmap_rx(priv, priv-cm.srq_ring[id].mapping);
+   ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
+ priv-cm.srq_ring[id].mapping);
dev_kfree_skb_any(priv-cm.srq_ring[id].skb);
priv-cm.srq_ring[id].skb = NULL;
}
@@ -101,8 +102,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, 
int id)
return ret;
 }
 
-static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
-u64 mapping[IPOIB_CM_RX_SG])
+static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, 
int frags,
+u64 mapping[IPOIB_CM_RX_SG])
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct sk_buff *skb;
@@ -110,7 +111,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, 
int id,
 
skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
if (unlikely(!skb))
-   return -ENOMEM;
+   return NULL;
 
/*
 * IPoIB adds a 4 byte header. So we need 12 more bytes to align the
@@ -122,10 +123,10 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, 
int id,
   DMA_FROM_DEVICE);
if (unlikely(ib_dma_mapping_error(priv-ca, mapping[0]))) {
dev_kfree_skb_any(skb);
-   return -EIO;
+   return NULL;
}
 
-   for (i = 0; i  IPOIB_CM_RX_SG - 1; i++) {
+   for (i = 0; i  frags; i++) {
struct page *page = alloc_page(GFP_ATOMIC);
 
if (!page)
@@ -139,7 +140,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, 
int id,
}
 
priv-cm.srq_ring[id].skb = skb;
-   return 0;
+   return skb;
 
 partial_error:
 
@@ -148,8 +149,8 @@ partial_error:
for (; i = 0; --i)
ib_dma_unmap_single(priv-ca, mapping[i + 1], PAGE_SIZE, 
DMA_FROM_DEVICE);
 
-   dev_kfree_skb_any(skb);
-   return -ENOMEM;
+   dev_kfree_skb_any(skb);
+   return NULL;
 }
 
 static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev,
@@ -312,7 +313,7 @@ static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id,
 }
 /* Adjust length of skb with fragments to match received data */
 static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
- unsigned int length)
+ unsigned int

Re: [openib-general] 32-bit build for ppc64 is required

2007-02-15 Thread Michael S. Tsirkin
 Quoting Hoang-Nam Nguyen [EMAIL PROTECTED]:
 Subject: Re: 32-bit build for ppc64 is required
 
  So, what you suggest is - build 2 types of libraries, but on PPC make
  binaries 32 bit? That's easy - do others agree to this approach?
 No, for execs please create 32bit and 64bit on PPC.

  Another option is to build binaries with whatever type of binary
  gcc without extra flags generates by default.
 On PPC we really need the ability to build both versions. The reason is
 simply that there're customers who want them. Why don't offer both
 options, and each component owner can decide her/his default?

I don't think this can be elegantly dealt with on a per-component basis.
But if some component owner has an opinion on this, do speak up - note
this affects binaries only, not libraries such as libehca.

 And the
 customers can pick the one(s) they like.
 I see your point regarding QA effort. Is it really twice?

Probably more - I'm reasonably sure most scripts written so far
assume stuff is installed in prefix/bin, so testing harness etc
would need to be changed.

And how to make sure the *correct* set of binaries was actually
QA'd?

 My assumption
 might be wrong: I guess we have to assure/test the 32/64bit compatibility
 anyway eg. 32bit client talks to 64bit server.
 If we have 32bit execs only for development resp. testing, why don't we
 also give them to customers in order to do basic test or diagnosis of
 their setup?

Because of the confusion this would create.
For shared libraries, the 32/64 bit issues seem to be automatically figured out
by ld.so, but there's no such solution for binaries.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 32-bit build for ppc64 is required

2007-02-15 Thread Michael S. Tsirkin
 Quoting Doug Ledford [EMAIL PROTECTED]:
 Subject: Re: 32-bit build for ppc64 is required
 
 On Thu, 2007-02-15 at 07:57 +0200, Michael S. Tsirkin wrote:
 
   The choice of 32/64 bit default is done on a per arch basis.  With
   x86_64/i386, the increased number of CPU registers in 64bit mode
   outweighs the increased code bloat that goes along with 64bit mode.  On
   PPC, no such register benefit exists for 64bit mode.  As such, 32bit
   apps on PPC are faster than the equivalent 64bit apps up to the point at
   which a 4GB address space becomes a problem.  Correspondingly, the
   default binaries on PPC are 32bit, and only those that *need* to be
   64bit are.  While a customer's application may need 4GB address space,
   certainly all the ibutils, diags, opensm, etc. do not.  As a result, we
   compile all of those utilities as 32bit by default on PPC.  We also ship
   all the libs as both 32/64bit so users can select the appropriate
   environment for their particular application (with the exception of
   dapl, which doesn't support 32bit and for which I filed a bug around the
   time of OFED 1.1).
  
  So, what you suggest is - build 2 types of libraries, but on PPC make
  binaries 32 bit? That's easy - do others agree to this approach?
 
 Yep, that's what we do.

Care to post a patch to Vlad's scripts?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH for-2.6.21] IB/ipoib: error handling thinko fix

2007-02-15 Thread Michael S. Tsirkin
ipoib_cm_alloc_rx_skb might be called from IRQ context, so it must use
dev_kfree_skb_any, not kfree_skb

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

This one's obvious, isn't it?

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 8ee6f06..e388a41 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -148,7 +148,7 @@ partial_error:
for (; i = 0; --i)
ib_dma_unmap_single(priv-ca, mapping[i + 1], PAGE_SIZE, 
DMA_FROM_DEVICE);
 
-   kfree_skb(skb);
+   dev_kfree_skb_any(skb);
return -ENOMEM;
 }
 

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-14 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
 
 OK, I already merged this but now I'm thinking it's somewhat buggy:

Hopefully not.

   +  if (coherent)
   +  ret = mthca_alloc_icm_coherent(dev-pdev-dev,
   + 
 chunk-mem[chunk-npages],
   + cur_order, gfp_mask);
   +  else
   +  ret = mthca_alloc_icm_pages(chunk-mem[chunk-npages],
   +  cur_order, gfp_mask);

   -  if (++chunk-npages == MTHCA_ICM_CHUNK_LEN) {
   +  if (!ret) {
   +  ++chunk-npages;
   +
   +  if (!coherent  chunk-npages == MTHCA_ICM_CHUNK_LEN) {
  chunk-nsg = pci_map_sg(dev-pdev, chunk-mem,
 
 I don't see anything that ever bumps chunk-nsg if we're allocating a
 coherent region and we end up needing more than one allocation to do
 it.

Yes but this is intentional.

 Maybe something like this on top of the patch?
 
 diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
 b/drivers/infiniband/hw/mthca/mthca_memfree.c
 index 0b9d053..48f7c65 100644
 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c
 +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
 @@ -175,7 +175,9 @@ struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, 
 int npages,
   if (!ret) {
   ++chunk-npages;
  
 - if (!coherent  chunk-npages == MTHCA_ICM_CHUNK_LEN) {
 + if (coherent)
 + ++chunk-nsg;
 + else if (chunk-npages == MTHCA_ICM_CHUNK_LEN) {
   chunk-nsg = pci_map_sg(dev-pdev, chunk-mem,
   chunk-npages,
   PCI_DMA_BIDIRECTIONAL);

No, I think the code is fine and this patch will break things:
chunk-nsg is needed only for non-coherent memory to call pci_unmap_sg:

   if (chunk-nsg  0)
pci_unmap_sg(dev-pdev, chunk-mem,
chunk-npages, PCI_DMA_BIDIRECTIONAL);



and we do *not* want to call pci_unmap_sg on consistent memory.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [GIT PULL] please pull infiniband.git

2007-02-14 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [GIT PULL] please pull infiniband.git
 
   What about the patch that i sent on Allow the following QP state
   transition : reset -- reset?
 
 OK, I'll merge that in the next patch.  It's the kind of patch I'm not
 happy about merging, since it bloats the code to handle a corner case
 no one is likely to hit in practice, but it is technically correct so
 I guess we're forced to merge it.

I actually think this is a spec bug:
IB spec requires the following transitions in modify qp:
Any State to Error
Any state to Reset

If we are in reset state, both of these do not make any sense.

To Error from reset is especially hard to implement, and I think
this will actually create bugs in application (if it did to error I
think its expecting completions with error).

I'll try speaking with our representatives at the IBTA to remove this 
requirement.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-14 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
 
 Queued for 2.6.21, although I think a further cleanup would be:
 
  mdev-mr_table.mpt_table = mthca_alloc_icm_table(mdev, 
 init_hca-mpt_base,
   dev_lim-mpt_entry_sz,
   mdev-limits.num_mpts,
   -   
 mdev-limits.reserved_mrws, 1);
   +   
 mdev-limits.reserved_mrws,
   +   1, 1);
 
 instead of having use_lowmem and use_coherent be separate parameters,
 we should probably convert it to a type parameter, and have
 MTHCA_ICM_TABLE_HIGHMEM, _LOWMEM and _COHERENT.  That would make these
 calls a lot easier to read and get correct.

Good idea.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-14 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
 
   +  sg_set_buf(mem, buf, PAGE_SIZE  order);
   +  BUG_ON(mem-offset);
   +  sg_dma_len(mem) = PAGE_SIZE  order;
 
 What am I missing?  Any reason to set sg_dma_len() again after sg_set_buf()?

How do you mean, again? Does sg_set_buf set dma_length?

In 2.6.20, I see this in include/linux/scatterlist.h:

static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
unsigned int buflen)
{
sg-page = virt_to_page(buf);
sg-offset = offset_in_page(buf);
sg-length = buflen;
}


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 32-bit build for ppc64 is required

2007-02-14 Thread Michael S. Tsirkin
 Quoting Stefan Roscher [EMAIL PROTECTED]:
 Subject: 32-bit build for ppc64 is required
 
 Hi,
 
 after building the latest ofed build package we recognized that on PPC64 only
 64-bit libaries were build.
 Because we have customers using older userpace apllications which are
 certified for 32-bit we think additional 32bit support is a requirement for 
 64bit builds.
 
 If OFED 1.2 supports 32 bit on ppc64, we have to change the install
 directory.I would suggest to install 32-bit binaries into
 /usr/local/ofed/bin32 directory. So no changes on current naming conventions
 has to be done.The libaries are installed in the /usr/local/ofed/lib 
 directory.

The standard practice is to install 64 bit libraries under prefix/lib64
and 32 bit libraries under prefix/lib. Why would PPC64 be any different?

I do not think we need 32 bit binaries at all, and there's no other package
I'm aware of that uses bin32.

Comments?


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 32-bit build for ppc64 is required

2007-02-14 Thread Michael S. Tsirkin
 Quoting Stefan Roscher [EMAIL PROTECTED]:
 Subject: Re: 32-bit build for ppc64 is required
 
 On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote:
   Quoting Stefan Roscher [EMAIL PROTECTED]:
   Subject: 32-bit build for ppc64 is required
   
   Hi,
   
   after building the latest ofed build package we recognized that on PPC64 
   only
   64-bit libaries were build.
   Because we have customers using older userpace apllications which are
   certified for 32-bit we think additional 32bit support is a requirement 
   for 64bit builds.
   
   If OFED 1.2 supports 32 bit on ppc64, we have to change the install
   directory.I would suggest to install 32-bit binaries into
   /usr/local/ofed/bin32 directory. So no changes on current naming 
   conventions
   has to be done.The libaries are installed in the /usr/local/ofed/lib 
   directory.
  
  The standard practice is to install 64 bit libraries under prefix/lib64
  and 32 bit libraries under prefix/lib. Why would PPC64 be any different?
 
 I think you missunderstand my post. The directory for 32/64bit libaries
 shouldbe prefix/lib and prefix/lib64 respectively. 
 But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only.  

Well, this is not by design: AFAIK on x86_64 both types of libraries
are installed.

  I do not think we need 32 bit binaries at all, and there's no other package
  I'm aware of that uses bin32.
 
 We have customers that still use 32-bit userspace applications. 
 It would be beneficial for them if they can obtain 32bit libs and execs from
 ofed1.2 in order to run their applications without recompiling them, because
 for some 32-bit applications recompiling is not an option.

32 bit libraries are needed for users to run 32 applications.

But I still do not see how installing 32 bit binaries alongside the 64
bit ones is useful, and I do not think other packages provide this option,
so maybe we shouldn't, either.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Bug 325] RDMA_CM and address translation broken on sles9sp3

2007-02-14 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [Bug 325] RDMA_CM and address translation broken on sles9sp3
 
 
 Tziporet, 
 
 I didn't think we were going to apply this patch until Michael tested it
 with SDP/IPoIB on various distros. 
 
 Michael, did you get a chance to test it (I'm guessing not since you
 were out sick)?  

Right, I'm not at the lab. I assume Vlad tested this before applying.

 The reason I'm concerned is that it changes the behavior of
 xxx_ip_dev_find() and _all_ backports, and we needed to test it out and
 make sure it doesn't regress anything.  If it causes problems on other
 backports, the plan was to just fix the sles9sp3 backport and leave the
 others alone. 
 
 With the test build vlad published yesterday which has this patch,
 rhel4u4 kernel wasn't working for me with iWARP and I'm afraid it might
 be due to this patch.  I'm investigating this now.

In actual fact, xxx_ip_dev_find is not even *needed* on anything
except 2.6.14, 2.6.15, 2.6.16 and 2.6.17: these are the kernels
which do not export ip_dev_find.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-14 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
 
I don't see anything that ever bumps chunk-nsg if we're allocating a
coherent region and we end up needing more than one allocation to do
it.
   
   Yes but this is intentional.
 
   No, I think the code is fine and this patch will break things:
   chunk-nsg is needed only for non-coherent memory to call pci_unmap_sg:
 
 what about this code in mthca_memfree.h?
 
   static inline void mthca_icm_next(struct mthca_icm_iter *iter)
   {
   if (++iter-page_idx = iter-chunk-nsg) {

Correct. Good catch.

 the call to pci_unmap_sg you're worried about is in
 mthca_free_icm_pages(), which can't be called for coherent memory
 anyway, so I don't see a problem with that.
 
 So I think my patch is correct and needed.

Yes, I agree. I'll also put it in OFED.
Thanks!

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 32-bit build for ppc64 is required

2007-02-14 Thread Michael S. Tsirkin
  Still, using non-standard hacks like bin32 does not sound like a good idea.

 I think the actual issue is there is no common approach for this on various
 platforms.


On platforms I've seen, there are 2 sets of libraries but only 64 bit 
executables
provided.

This is what we had for OFED 1.0, OFED 1.1, and I don't see
the reason to change that - adding more executables to install on
production systems will double QA work.


 
  Maybe an option to *only* make 32 bit userspace might make sense though.
  Something like --disable-32bit, --disable-64bit.
  This would solve your problem, would it not?

 Does that mean if I don't specify one of them, I'll get 32- and 64bit
 execs?
 If yes, that's fine.

No, by default we build 2 sets of libraries, and only 64 bit execs.

But for your development purposes (I think you mentioned testing user/kernel
context switch) I think we could have --disable-32bit flag to configure to get
only 32 bit userspace.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 32-bit build for ppc64 is required

2007-02-14 Thread Michael S. Tsirkin
 Quoting Doug Ledford [EMAIL PROTECTED]:
 Subject: Re: [openib-general] 32-bit build for ppc64 is required
 
 On Wed, 2007-02-14 at 16:29 +0200, Michael S. Tsirkin wrote:
   Quoting Stefan Roscher [EMAIL PROTECTED]:
   Subject: Re: 32-bit build for ppc64 is required
   
   On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote:
 Quoting Stefan Roscher [EMAIL PROTECTED]:
 Subject: 32-bit build for ppc64 is required
 
 Hi,
 
 after building the latest ofed build package we recognized that on 
 PPC64 only
 64-bit libaries were build.
 Because we have customers using older userpace apllications which are
 certified for 32-bit we think additional 32bit support is a 
 requirement for 64bit builds.
 
 If OFED 1.2 supports 32 bit on ppc64, we have to change the install
 directory.I would suggest to install 32-bit binaries into
 /usr/local/ofed/bin32 directory. So no changes on current naming 
 conventions
 has to be done.The libaries are installed in the /usr/local/ofed/lib 
 directory.

The standard practice is to install 64 bit libraries under prefix/lib64
and 32 bit libraries under prefix/lib. Why would PPC64 be any different?
   
   I think you missunderstand my post. The directory for 32/64bit libaries
   shouldbe prefix/lib and prefix/lib64 respectively. 
   But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs 
   only.  
  
  Well, this is not by design: AFAIK on x86_64 both types of libraries
  are installed.
  
I do not think we need 32 bit binaries at all, and there's no other 
package
I'm aware of that uses bin32.
   
   We have customers that still use 32-bit userspace applications. 
   It would be beneficial for them if they can obtain 32bit libs and execs 
   from
   ofed1.2 in order to run their applications without recompiling them, 
   because
   for some 32-bit applications recompiling is not an option.
  
  32 bit libraries are needed for users to run 32 applications.
  
  But I still do not see how installing 32 bit binaries alongside the 64
  bit ones is useful, and I do not think other packages provide this option,
  so maybe we shouldn't, either.
 
 The choice of 32/64 bit default is done on a per arch basis.  With
 x86_64/i386, the increased number of CPU registers in 64bit mode
 outweighs the increased code bloat that goes along with 64bit mode.  On
 PPC, no such register benefit exists for 64bit mode.  As such, 32bit
 apps on PPC are faster than the equivalent 64bit apps up to the point at
 which a 4GB address space becomes a problem.  Correspondingly, the
 default binaries on PPC are 32bit, and only those that *need* to be
 64bit are.  While a customer's application may need 4GB address space,
 certainly all the ibutils, diags, opensm, etc. do not.  As a result, we
 compile all of those utilities as 32bit by default on PPC.  We also ship
 all the libs as both 32/64bit so users can select the appropriate
 environment for their particular application (with the exception of
 dapl, which doesn't support 32bit and for which I filed a bug around the
 time of OFED 1.1).

So, what you suggest is - build 2 types of libraries, but on PPC make
binaries 32 bit? That's easy - do others agree to this approach?

Another option is to build binaries with whatever type of binary
gcc without extra flags generates by default.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IPoIB: Only allow root to change between datagram and connected mode

2007-02-14 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: [PATCH] IPoIB: Only allow root to change between datagram and 
 connected mode
 
 Change the permissions of the mode sysfs attribute to be S_IWUSR
 instead of S_IWUGO.
 
 Signed-off-by: Roland Dreier [EMAIL PROTECTED]
 ---
 FYI -- I'm planning to merge this for 2.6.21.  It doesn't seem
 appropriate to allow ordinary users to mess with this sort of config.

Acked-by: Michael S. Tsirkin [EMAIL PROTECTED]


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Fwd: [ANNOUNCE] GIT 1.5.0

2007-02-14 Thread Michael S. Tsirkin
FYI.
I suggest we update git on the openfabrics server to 1.5.0:
Detached HEAD feature will be useful for nightly build scripts.
Sasha?

- Forwarded message from Junio C Hamano [EMAIL PROTECTED] -

Subject: [ANNOUNCE] GIT 1.5.0
Date: Wed, 14 Feb 2007 05:14:16 +0200
From: Junio C Hamano [EMAIL PROTECTED]


The latest feature release GIT 1.5.0 is available at the usual places:

  http://www.kernel.org/pub/software/scm/git/

  git-1.5.0.tar.{gz,bz2}(tarball)
  git-htmldocs-1.5.0.tar.{gz,bz2}   (preformatted docs)
  git-manpages-1.5.0.tar.{gz,bz2}   (preformatted docs)
  RPMS/$arch/git-*-1.5.0-1.$arch.rpm(RPM)



GIT v1.5.0 Release Notes


Old news


This section is for people who are upgrading from ancient
versions of git.  Although all of the changes in this section
happened before the current v1.4.4 release, they are summarized
here in the v1.5.0 release notes for people who skipped earlier
versions.

As of git v1.5.0 there are some optional features that changes
the repository to allow data to be stored and transferred more
efficiently.  These features are not enabled by default, as they
will make the repository unusable with older versions of git.
Specifically, the available options are:

 - There is a configuration variable core.legacyheaders that
   changes the format of loose objects so that they are more
   efficient to pack and to send out of the repository over git
   native protocol, since v1.4.2.  However, loose objects
   written in the new format cannot be read by git older than
   that version; people fetching from your repository using
   older clients over dumb transports (e.g. http) using older
   versions of git will also be affected.

 - Since v1.4.3, configuration repack.usedeltabaseoffset allows
   packfile to be created in more space efficient format, which
   cannot be read by git older than that version.

The above two are not enabled by default and you explicitly have
to ask for them, because these two features make repositories
unreadable by older versions of git, and in v1.5.0 we still do
not enable them by default for the same reason.  We will change
this default probably 1 year after 1.4.2's release, when it is
reasonable to expect everybody to have new enough version of
git.

 - 'git pack-refs' appeared in v1.4.4; this command allows tags
   to be accessed much more efficiently than the traditional
   'one-file-per-tag' format.  Older git-native clients can
   still fetch from a repository that packed and pruned refs
   (the server side needs to run the up-to-date version of git),
   but older dumb transports cannot.  Packing of refs is done by
   an explicit user action, either by use of git pack-refs
   --prune command or by use of git gc command.

 - 'git -p' to paginate anything -- many commands do pagination
   by default on a tty.  Introduced between v1.4.1 and v1.4.2;
   this may surprise old timers.

 - 'git archive' superseded 'git tar-tree' in v1.4.3;

 - 'git cvsserver' was new invention in v1.3.0;

 - 'git repo-config', 'git grep', 'git rebase' and 'gitk' were
   seriously enhanced during v1.4.0 timeperiod.

 - 'gitweb' became part of git.git during v1.4.0 timeperiod and
   seriously modified since then.

 - reflog is an v1.4.0 invention.  This allows you to name a
   revision that a branch used to be at (e.g. git diff
   [EMAIL PROTECTED] master allows you to see changes since
   yesterday's tip of the branch).


Updates in v1.5.0 since v1.4.4 series
-

* Index manipulation

 - git-add is to add contents to the index (aka staging area
   for the next commit), whether the file the contents happen to
   be is an existing one or a newly created one.

 - git-add without any argument does not add everything
   anymore.  Use 'git-add .' instead.  Also you can add
   otherwise ignored files with an -f option.

 - git-add tries to be more friendly to users by offering an
   interactive mode (git-add -i).

 - git-commit path used to refuse to commit if path was
   different between HEAD and the index (i.e. update-index was
   used on it earlier).  This check was removed.

 - git-rm is much saner and safer.  It is used to remove paths
   from both the index file and the working tree, and makes sure
   you are not losing any local modification before doing so.

 - git-reset tree paths... can be used to revert index
   entries for selected paths.

 - git-update-index is much less visible.  Many suggestions to
  use the command in git output and documentation have now been
  replaced by simpler commands such as git add or git rm.


* Repository layout and objects transfer

 - The data for origin repository is stored in the configuration
   file $GIT_DIR/config, not in $GIT_DIR/remotes/, for newly
   created clones.  The latter is still supported and there is
   no need to convert your 

Re: [openib-general] [PATCH RFC] use common cq for ipoib cm send side

2007-02-11 Thread Michael S. Tsirkin
 Quoting Michael S. Tsirkin [EMAIL PROTECTED]:
 Subject: [PATCH RFC] use common cq for ipoib cm send side
 
 The following untested patch moves all TX processing in IPoIB CM to common CQ.
 This should help reduce the number of interrupts for bi-directional traffic
 (such as TCP). Is this a good idea? What do others think?
 
 Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

FYI, this was just thinking aloud.  The version below works fine here but the
performance gain seems to be very small (about 1%).  The gain with NAPI might
be bigger but this is yet to be tested.  I'll continue looking into this.

Feedback wellcome.

 ipoib.h  |   10 +--
 ipoib_cm.c   |   78 +++
 ipoib_ib.c   |   28 -
 ipoib_main.c |2 -
 4 files changed, 45 insertions(+), 73 deletions(-)



Use common CQ for all TX QPs: keep a per-device counter out outstanding
tx WRs, and stop the interface when this counter reaches the send queue size, to
avoid CQ overruns.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index eb885ee..ef703c7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -99,9 +99,9 @@ enum {
 
 #defineIPOIB_OP_RECV   (1ul  31)
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
-#defineIPOIB_CM_OP_SRQ (1ul  30)
+#defineIPOIB_OP_CM (1ul  30)
 #else
-#defineIPOIB_CM_OP_SRQ (0)
+#defineIPOIB_OP_CM (0)
 #endif
 
 /* structs */
@@ -144,7 +144,6 @@ struct ipoib_cm_rx {
 
 struct ipoib_cm_tx {
struct ib_cm_id *id;
-   struct ib_cq*cq;
struct ib_qp*qp;
struct list_head list;
struct net_device   *dev;
@@ -233,6 +232,7 @@ struct ipoib_dev_priv {
unsigned tx_tail;
struct ib_sgetx_sge;
struct ib_send_wrtx_wr;
+   unsigned tx_outstanding;
 
struct ib_wc ibwc[IPOIB_NUM_WC];
 
@@ -439,6 +439,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx);
 void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
   unsigned int mtu);
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc);
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc);
 #else
 
 struct ipoib_cm_tx;
@@ -527,6 +528,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device 
*dev, struct ib_wc *w
 {
 }
 
+static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc 
*wc)
+{
+}
 #endif
 
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 8ee6f06..af36562 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -85,7 +85,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int 
id)
struct ib_recv_wr *bad_wr;
int i, ret;
 
-   priv-cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+   priv-cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV;
 
for (i = 0; i  IPOIB_CM_RX_SG; ++i)
priv-cm.rx_sge[i].addr = priv-cm.srq_ring[id].mapping[i];
@@ -346,7 +346,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int 
hdr_space,
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   unsigned int wr_id = wc-wr_id  ~IPOIB_CM_OP_SRQ;
+   unsigned int wr_id = wc-wr_id  ~(IPOIB_OP_CM | IPOIB_OP_RECV);
struct sk_buff *skb;
struct ipoib_cm_rx *p;
unsigned long flags;
@@ -433,7 +433,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
priv-tx_sge.addr = addr;
priv-tx_sge.length   = len;
 
-   priv-tx_wr.wr_id = wr_id;
+   priv-tx_wr.wr_id = wr_id | IPOIB_OP_CM;
 
return ib_post_send(tx-qp, priv-tx_wr, bad_wr);
 }
@@ -484,20 +484,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff 
*skb, struct ipoib_cm_
dev-trans_start = jiffies;
++tx-tx_head;
 
-   if (tx-tx_head - tx-tx_tail == ipoib_sendq_size) {
+   if (++priv-tx_outstanding == ipoib_sendq_size) {
ipoib_dbg(priv, TX ring 0x%x full, stopping kernel net 
queue\n,
  tx-qp-qp_num);
netif_stop_queue(dev);
-   set_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags);
}
}
 }
 
-static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx 
*tx,
- struct ib_wc *wc)
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   unsigned int wr_id = wc-wr_id;
+   struct ipoib_cm_tx *tx = wc

Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-10 Thread Michael S. Tsirkin
  +
  
  It seems same goes for
  
  +   mc = kzalloc(sizeof(*mc), GFP_KERNEL);
  +   if (!mc)
  +   return NULL;
 
 We would need to set events_reported.

IMO, probably worth it to init just this one field rather than use up
initialized memory - and I think it's clearer.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes

2007-02-10 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
 
 Michael What about the mthca memory registration patches?  I
 Michael thought they are on their way. Should I repost?
 
 Sorry, I forgot about that.  Yes, please resend the latest state.

OK, coming up.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 1 of 4] IB/mthca: merge MR and FMR space on 64 bit

2007-02-10 Thread Michael S. Tsirkin
For Tavor, we currently reserve separate MPT and MTT space for
FMRs to avoid abusing the vmalloc space on 32 bit kernels. No
such problem exists on 64 bit kernels so let's not do it there.

This way we have a shared pool for MR and FMR resources, used on
demand.  This will also make it possible to write MTTs for
regular regions directly from driver.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
===
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_mr.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -761,7 +761,7 @@ void mthca_arbel_fmr_unmap(struct mthca_
 int mthca_init_mr_table(struct mthca_dev *dev)
 {
unsigned long addr;
-   int err, i;
+   int mpts, mtts, err, i;
 
err = mthca_alloc_init(dev-mr_table.mpt_alloc,
   dev-limits.num_mpts,
@@ -795,13 +795,21 @@ int mthca_init_mr_table(struct mthca_dev
err = -EINVAL;
goto err_fmr_mpt;
}
+   mpts = mtts = 1  i;
+   } else {
+   mpts = dev-limits.num_mtt_segs;
+   mtts = dev-limits.num_mpts;
+   }
+
+   if (!mthca_is_memfree(dev) 
+   (dev-mthca_flags  MTHCA_FLAG_FMR)) {
 
addr = pci_resource_start(dev-pdev, 4) +
((pci_resource_len(dev-pdev, 4) - 1) 
 dev-mr_table.mpt_base);
 
dev-mr_table.tavor_fmr.mpt_base =
-   ioremap(addr, (1  i) * sizeof(struct 
mthca_mpt_entry));
+   ioremap(addr, mpts * sizeof(struct mthca_mpt_entry));
 
if (!dev-mr_table.tavor_fmr.mpt_base) {
mthca_warn(dev, MPT ioremap for FMR failed.\n);
@@ -814,19 +822,21 @@ int mthca_init_mr_table(struct mthca_dev
 dev-mr_table.mtt_base);
 
dev-mr_table.tavor_fmr.mtt_base =
-   ioremap(addr, (1  i) * MTHCA_MTT_SEG_SIZE);
+   ioremap(addr, mtts * MTHCA_MTT_SEG_SIZE);
if (!dev-mr_table.tavor_fmr.mtt_base) {
mthca_warn(dev, MTT ioremap for FMR failed.\n);
err = -ENOMEM;
goto err_fmr_mtt;
}
+   }
 
-   err = mthca_buddy_init(dev-mr_table.tavor_fmr.mtt_buddy, i);
+   if (dev-limits.fmr_reserved_mtts) {
+   err = mthca_buddy_init(dev-mr_table.tavor_fmr.mtt_buddy, 
fls(mtts - 1));
if (err)
goto err_fmr_mtt_buddy;
 
/* Prevent regular MRs from using FMR keys */
-   err = mthca_buddy_alloc(dev-mr_table.mtt_buddy, i);
+   err = mthca_buddy_alloc(dev-mr_table.mtt_buddy, fls(mtts - 
1));
if (err)
goto err_reserve_fmr;
 
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_profile.c
===
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_profile.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_profile.c
@@ -277,7 +277,7 @@ u64 mthca_make_profile(struct mthca_dev 
 * out of the MR pool. They don't use additional memory, but
 * we assign them as part of the HCA profile anyway.
 */
-   if (mthca_is_memfree(dev))
+   if (mthca_is_memfree(dev) || BITS_PER_LONG == 64)
dev-limits.fmr_reserved_mtts = 0;
else
dev-limits.fmr_reserved_mtts = request-fmr_reserved_mtts;

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 2 of 4] IB/mthca: always fill MTTs from CPU

2007-02-10 Thread Michael S. Tsirkin
Speed up memory registration by filling in MTTs directly.  This
reduces the number of FW commands needed to register an MR by at
least a factor of 2.  This applies to all memfree cards, and to
tavor mode on 64 bit systems with the patch I posted earlier.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_dev.h
===
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_dev.h
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -464,6 +464,8 @@ void mthca_uar_free(struct mthca_dev *de
 int mthca_pd_alloc(struct mthca_dev *dev, int privileged, struct mthca_pd *pd);
 void mthca_pd_free(struct mthca_dev *dev, struct mthca_pd *pd);
 
+int mthca_write_mtt_size(struct mthca_dev *dev);
+
 struct mthca_mtt *mthca_alloc_mtt(struct mthca_dev *dev, int size);
 void mthca_free_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt);
 int mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt,
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
===
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_mr.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -243,8 +243,8 @@ void mthca_free_mtt(struct mthca_dev *de
kfree(mtt);
 }
 
-int mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt,
-   int start_index, u64 *buffer_list, int list_len)
+static int __mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt,
+int start_index, u64 *buffer_list, int list_len)
 {
struct mthca_mailbox *mailbox;
__be64 *mtt_entry;
@@ -295,6 +295,84 @@ out:
return err;
 }
 
+void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt,
+ int start_index, u64 *buffer_list, int list_len)
+{
+   u64 __iomem *mtts;
+   u32 mtt_seg;
+   int i;
+
+   mtt_seg = mtt-first_seg * MTHCA_MTT_SEG_SIZE;
+   mtts = dev-mr_table.tavor_fmr.mtt_base + mtt_seg + start_index 
* sizeof (u64);
+   for (i = 0; i  list_len; ++i) {
+   __be64 mtt_entry = cpu_to_be64(buffer_list[i] |
+  MTHCA_MTT_FLAG_PRESENT);
+   mthca_write64_raw(mtt_entry, mtts + i);
+   }
+}
+
+void mthca_arbel_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt,
+ int start_index, u64 *buffer_list, int list_len)
+{
+   __be64 *mtts;
+   int i;
+   int s = start_index * sizeof (u64);
+
+   /* For Arbel, all MTTs must fit in the same page. */
+   BUG_ON(s / PAGE_SIZE != (s + list_len * sizeof(u64) - 1) / PAGE_SIZE);
+   /* Require full segments */
+   BUG_ON(s % MTHCA_MTT_SEG_SIZE);
+
+   mtts = mthca_table_find(dev-mr_table.mtt_table, mtt-first_seg +
+   s / MTHCA_MTT_SEG_SIZE);
+
+   BUG_ON(!mtts);
+
+   for (i = 0; i  list_len; ++i)
+   mtts[i] = cpu_to_be64(buffer_list[i] | MTHCA_MTT_FLAG_PRESENT);
+}
+
+int mthca_write_mtt_size(struct mthca_dev *dev)
+{
+   if (dev-mr_table.fmr_mtt_buddy != dev-mr_table.mtt_buddy)
+   /*
+* Be friendly to WRITE_MTT command
+* and leave two empty slots for the
+* index and reserved fields of the
+* mailbox.
+*/
+   return PAGE_SIZE / sizeof (u64) - 2;
+
+   /* For Arbel, all MTTs must fit in the same page. */
+   return mthca_is_memfree(dev) ? (PAGE_SIZE / sizeof (u64)) : 0x7ff;
+}
+
+int mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt,
+   int start_index, u64 *buffer_list, int list_len)
+{
+   int size = mthca_write_mtt_size(dev);
+   int chunk;
+
+   if (dev-mr_table.fmr_mtt_buddy != dev-mr_table.mtt_buddy)
+   return __mthca_write_mtt(dev, mtt, start_index, buffer_list, 
list_len);
+
+   while (list_len  0) {
+   chunk = min(size, list_len);
+   if (mthca_is_memfree(dev))
+   mthca_arbel_write_mtt_seg(dev, mtt, start_index,
+ buffer_list, chunk);
+   else
+   mthca_tavor_write_mtt_seg(dev, mtt, start_index,
+ buffer_list, chunk);
+
+   list_len-= chunk;
+   start_index += chunk;
+   buffer_list += chunk;
+   }
+
+   return 0;
+}
+
 static inline u32 tavor_hw_index_to_key(u32 ind)
 {
return ind;
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.c
===
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_provider.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -1015,6 +1015,7 @@ static struct ib_mr *mthca_reg_user_mr(s
int

[openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-10 Thread Michael S. Tsirkin
Fix non-cache-coherent CPUs with memfree HCAs.

We allocate the MTT table with alloc_pages() and then do
pci_map_sg(), so we must call pci_dma_sync_sg after the CPU
writes to the MTT table (this works since device never writes the
MTTs on memfree).

For MPTs, both the device and CPU might write there, so we must
allocate dma coherent memory for these.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.c
===
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -35,6 +35,8 @@
  */
 
 #include linux/mm.h
+#include linux/scatterlist.h
+#include asm/page.h
 
 #include mthca_memfree.h
 #include mthca_dev.h
@@ -58,22 +60,31 @@ struct mthca_user_db_table {
}page[0];
 };
 
-void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm)
+void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm, int coherent)
 {
struct mthca_icm_chunk *chunk, *tmp;
+   void *buf;
int i;
 
if (!icm)
return;
 
list_for_each_entry_safe(chunk, tmp, icm-chunk_list, list) {
-   if (chunk-nsg  0)
-   pci_unmap_sg(dev-pdev, chunk-mem, chunk-npages,
-PCI_DMA_BIDIRECTIONAL);
-
-   for (i = 0; i  chunk-npages; ++i)
-   __free_pages(chunk-mem[i].page,
-get_order(chunk-mem[i].length));
+   if (coherent)
+   for (i = 0; i  chunk-npages; ++i) {
+   buf = lowmem_page_address(chunk-mem[i].page);
+   dma_free_coherent(dev-pdev-dev, 
chunk-mem[i].length,
+ buf, 
sg_dma_address(chunk-mem[i]));
+   }
+   else {
+   if (chunk-nsg  0)
+   pci_unmap_sg(dev-pdev, chunk-mem, 
chunk-npages,
+PCI_DMA_BIDIRECTIONAL);
+
+   for (i = 0; i  chunk-npages; ++i)
+   __free_pages(chunk-mem[i].page,
+get_order(chunk-mem[i].length));
+   }
 
kfree(chunk);
}
@@ -81,12 +92,41 @@ void mthca_free_icm(struct mthca_dev *de
kfree(icm);
 }
 
+static int mthca_alloc_icm_pages(struct scatterlist *mem, int order, gfp_t 
gfp_mask)
+{
+   mem-page = alloc_pages(gfp_mask, order);
+   if (!mem-page)
+   return -ENOMEM;
+
+   mem-length = PAGE_SIZE  order;
+   mem-offset = 0;
+   return 0;
+}
+
+static int mthca_alloc_icm_coherent(struct device *dev, struct scatterlist 
*mem,
+   int order, gfp_t gfp_mask)
+{
+   void *buf = dma_alloc_coherent(dev, PAGE_SIZE  order, 
sg_dma_address(mem),
+  gfp_mask);
+   if (!buf)
+   return -ENOMEM;
+
+   sg_set_buf(mem, buf, PAGE_SIZE  order);
+   BUG_ON(mem-offset);
+   sg_dma_len(mem) = PAGE_SIZE  order;
+   return 0;
+}
+
 struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages,
- gfp_t gfp_mask)
+ gfp_t gfp_mask, int coherent)
 {
struct mthca_icm *icm;
struct mthca_icm_chunk *chunk = NULL;
int cur_order;
+   int ret;
+
+   /* We use sg_set_buf for coherent allocs, which assumes low memory */
+   BUG_ON(coherent  (gfp_mask  __GFP_HIGHMEM));
 
icm = kmalloc(sizeof *icm, gfp_mask  ~(__GFP_HIGHMEM | __GFP_NOWARN));
if (!icm)
@@ -112,21 +152,28 @@ struct mthca_icm *mthca_alloc_icm(struct
while (1  cur_order  npages)
--cur_order;
 
-   chunk-mem[chunk-npages].page = alloc_pages(gfp_mask, 
cur_order);
-   if (chunk-mem[chunk-npages].page) {
-   chunk-mem[chunk-npages].length = PAGE_SIZE  
cur_order;
-   chunk-mem[chunk-npages].offset = 0;
+   if (coherent)
+   ret = mthca_alloc_icm_coherent(dev-pdev-dev,
+  
chunk-mem[chunk-npages],
+  cur_order, gfp_mask);
+   else
+   ret = mthca_alloc_icm_pages(chunk-mem[chunk-npages],
+   cur_order, gfp_mask);
 
-   if (++chunk-npages == MTHCA_ICM_CHUNK_LEN) {
+   if (!ret) {
+   ++chunk-npages;
+
+   if (!coherent  chunk-npages == MTHCA_ICM_CHUNK_LEN) {
chunk-nsg = pci_map_sg(dev-pdev, chunk-mem

[openib-general] [PATCH 4 of 4] IB/mthca: give reserved MTTs a separate cache line

2007-02-10 Thread Michael S. Tsirkin
This fixes several issues related to reserved MTTs and memory alignment.

1. MTTs are allocated in non-cache-coherent memory, so we must give
reserved MTTs their own cache line, to prevent both device and
CPU from writing into the same cache line at the same time.

2. reserved_mtts field has different meaning in Tavor and Arbel,
so we are wasting mtt entries on memfree. Fix the Arbel case to match
Tavor semantics.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c
===
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_main.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c
@@ -464,6 +464,10 @@ static int mthca_init_icm(struct mthca_d
goto err_unmap_aux;
}
 
+   /* CPU writes to non-reserved MTTs, while HCA might DMA to reserved 
mtts */
+   mdev-limits.reserved_mtts = ALIGN(mdev-limits.reserved_mtts * 
MTHCA_MTT_SEG_SIZE,
+  dma_get_cache_alignment()) / 
MTHCA_MTT_SEG_SIZE;
+
mdev-mr_table.mtt_table = mthca_alloc_icm_table(mdev, 
init_hca-mtt_base,
 MTHCA_MTT_SEG_SIZE,
 
mdev-limits.num_mtt_segs,
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_cmd.c
===
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -1051,7 +1051,11 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev
MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_EQ_OFFSET);
dev_lim-max_eqs = 1  (field  0x7);
MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MTT_OFFSET);
-   dev_lim-reserved_mtts = 1  (field  4);
+   if (mthca_is_memfree(dev))
+   dev_lim-reserved_mtts = ALIGN((1  (field  4)) * 
sizeof(u64),
+  MTHCA_MTT_SEG_SIZE) / 
MTHCA_MTT_SEG_SIZE;
+   else
+   dev_lim-reserved_mtts = 1  (field  4);
MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_MRW_SZ_OFFSET);
dev_lim-max_mrw_sz = 1  field;
MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MRW_OFFSET);
-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] integer overflow

2007-02-10 Thread Michael S. Tsirkin
Roland, the following code in ipoib:

while ((int) priv-tx_tail - (int) priv-tx_head  0) {

seems to rely on integer overflow which seems to be
undefined behaviour.

Should we care?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] integer overflow

2007-02-10 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: integer overflow
 
  while ((int) priv-tx_tail - (int) priv-tx_head  0) {
   
   seems to rely on integer overflow which seems to be
   undefined behaviour.
 
 tx_tail and tx_head are unsigned, and overflow is defined for unsigned
 integers.

Yes but we cast them to signed int here - no?


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH RFC] use common cq for ipoib cm send side

2007-02-10 Thread Michael S. Tsirkin
The following untested patch moves all TX processing in IPoIB CM to common CQ.
This should help reduce the number of interrupts for bi-directional traffic
(such as TCP). Is this a good idea? What do others think?

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index eb885ee..ef703c7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -99,9 +99,9 @@ enum {
 
 #defineIPOIB_OP_RECV   (1ul  31)
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
-#defineIPOIB_CM_OP_SRQ (1ul  30)
+#defineIPOIB_OP_CM (1ul  30)
 #else
-#defineIPOIB_CM_OP_SRQ (0)
+#defineIPOIB_OP_CM (0)
 #endif
 
 /* structs */
@@ -144,7 +144,6 @@ struct ipoib_cm_rx {
 
 struct ipoib_cm_tx {
struct ib_cm_id *id;
-   struct ib_cq*cq;
struct ib_qp*qp;
struct list_head list;
struct net_device   *dev;
@@ -233,6 +232,7 @@ struct ipoib_dev_priv {
unsigned tx_tail;
struct ib_sgetx_sge;
struct ib_send_wrtx_wr;
+   unsigned tx_outstanding;
 
struct ib_wc ibwc[IPOIB_NUM_WC];
 
@@ -439,6 +439,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx);
 void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
   unsigned int mtu);
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc);
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc);
 #else
 
 struct ipoib_cm_tx;
@@ -527,6 +528,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device 
*dev, struct ib_wc *w
 {
 }
 
+static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc 
*wc)
+{
+}
 #endif
 
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 8ee6f06..47c868c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -85,7 +85,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int 
id)
struct ib_recv_wr *bad_wr;
int i, ret;
 
-   priv-cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+   priv-cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV;
 
for (i = 0; i  IPOIB_CM_RX_SG; ++i)
priv-cm.rx_sge[i].addr = priv-cm.srq_ring[id].mapping[i];
@@ -346,7 +346,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int 
hdr_space,
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   unsigned int wr_id = wc-wr_id  ~IPOIB_CM_OP_SRQ;
+   unsigned int wr_id = wc-wr_id  ~(IPOIB_OP_CM | IPOIB_OP_RECV);
struct sk_buff *skb;
struct ipoib_cm_rx *p;
unsigned long flags;
@@ -433,7 +433,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
priv-tx_sge.addr = addr;
priv-tx_sge.length   = len;
 
-   priv-tx_wr.wr_id = wr_id;
+   priv-tx_wr.wr_id = wr_id | IPOIB_OP_CM;
 
return ib_post_send(tx-qp, priv-tx_wr, bad_wr);
 }
@@ -484,20 +484,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff 
*skb, struct ipoib_cm_
dev-trans_start = jiffies;
++tx-tx_head;
 
-   if (tx-tx_head - tx-tx_tail == ipoib_sendq_size) {
+   if (++priv-tx_outstanding == ipoib_sendq_size) {
ipoib_dbg(priv, TX ring 0x%x full, stopping kernel net 
queue\n,
  tx-qp-qp_num);
netif_stop_queue(dev);
-   set_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags);
}
}
 }
 
-static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx 
*tx,
- struct ib_wc *wc)
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   unsigned int wr_id = wc-wr_id;
+   struct ipoib_cm_tx *tx = wc-qp-qp_context;
+   unsigned int wr_id = wc-wr_id  ~IPOIB_OP_CM;
struct ipoib_tx_buf *tx_req;
unsigned long flags;
 
@@ -522,11 +521,10 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, 
struct ipoib_cm_tx *tx
 
spin_lock_irqsave(priv-tx_lock, flags);
++tx-tx_tail;
-   if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags)) 
-   tx-tx_head - tx-tx_tail = ipoib_sendq_size  1) {
-   clear_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags);
+   if (unlikely(--priv-tx_outstanding == ipoib_sendq_size  1) 
+   netif_queue_stopped(dev) 
+   test_bit(IPOIB_FLAG_ADMIN_UP, priv-flags))
netif_wake_queue(dev);
-   }
 
if (wc-status != IB_WC_SUCCESS 
wc-status != IB_WC_WR_FLUSH_ERR) {
@@ -551,8 +549,17 @@ static

Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-09 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: please pull for 2.6.21: fix + add IB multicast support
 
 I merged the increment port number and remove redundant '_wq'
 patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland
 
 I plan to review to multicast stuff next week and I hope to merge it
 for 2.6.21.  Or, have you or anyone else at Voltaire read over the
 code in addition to using it?  Do you see anything that should be
 cleaned up?

I looked at the code briefly, don't have much time at the moment
unfortunately.

+static void join_group(struct mcast_group *group, struct mcast_member *member,
+  u8 join_state)
+{
+   member-state = MCAST_MEMBER;
+   adjust_membership(group, join_state, 1);
+   group-rec.join_state |= join_state;
+   member-multicast.rec = group-rec;
+   member-multicast.rec.join_state = join_state;
+   list_del(member-list);
+   list_add(member-list, group-active_list);
+}

Can be just list_move.

Patch allocates everything with kzalloc, but then goes ahead and initialize 
everything.
So just kmalloc it - no reason to waste initialized memory if non-initialized 
will do.
List of places:

+   member = kzalloc(sizeof *member, gfp_mask);
+   if (!member)
+   return ERR_PTR(-ENOMEM);


Same here:

+   group = kzalloc(sizeof *group, gfp_mask);
+   if (!group)
+   return NULL;
+

and same here:

+   iter = kzalloc(sizeof *iter + attr_size, GFP_KERNEL);
+   if (!iter)
+   return ERR_PTR(-ENOMEM);
+

It seems same goes for

+   mc = kzalloc(sizeof(*mc), GFP_KERNEL);
+   if (!mc)
+   return NULL;

in ucma.c - everything gets initied by calling function - but
a bit less sure, needs checking.

By the way, it seems same goes for

+   bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
+   if (!bind_list)
+   return -ENOMEM;

in cma_alloc_any_port in the port randomization patch that was merged
and for cma_alloc_port in existing code.


-- 
MSTYou seem to be careful to do list_del_init for member-list all over,

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-09 Thread Michael S. Tsirkin
  Or, have you or anyone else at Voltaire read over the
  code in addition to using it?  Do you see anything that should be
  cleaned up?
 
 OK, I most the the review i did (and interaction with Sean to add changes) was
 on the rdma_cm: add multicast communication support patch, and i was
 less focused
 on the ib_sa: track multicast join/leave requests patch,  however i
 recall that there were some discussions between Sean and Michael and
 they reached an agreement.

Yes, we reached an agreement.
These patches have also seen some limited testing in the OFED tree.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes

2007-02-09 Thread Michael S. Tsirkin
 Quoting r. Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
 
 On Fri, 2007-02-09 at 08:51 +0200, Michael S. Tsirkin wrote:
Also I agree with MST, I would like to see the core/ subdirectory die
completely.

   
   ok ok...I'll kill the subdir...
  
  It's not just the directory BTW. Stuff like building completions in
  t3_cqe format and then reformatting to ib_wc seems to be much more confusing
  (and some of it is actually on datapath).
 
 The t3_cqe format is built BY THE HW.

I understand, I did not get that.

But for example create_read_req_cqe builds it in software.
It could build ib_wc instead.

...

  Having to wade through 3 driver-specific layers of abstractions just 
  because I want to
  for example change API in ib_verbs.h and need to update all drivers will be
  very taxing. I understand your design calls for 2 layers, but at least the 
  API exposed
  by code in drivers/net is fairly small, while cxio_wr.h declares 27 
  structures
  which seem to just duplicate ib_verbs.h.
 
 cxio_wr.h is hw format.  You want me to change ib_verbs.h to make WRs
 and CQEs align with Chelsio hardware?

No, but it need not be part of interface.  The reason I was confused is because
you seem to create an extra copy e.g.  for t3_cqe.  cxio_poll_cq currently
creates an intermediate copy of the completion on the stack, I think it could
format ib_wc directly instead.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Open MPI rpmbuild fails in OFED-1.2

2007-02-08 Thread Michael S. Tsirkin
 Quoting Jeff Squyres [EMAIL PROTECTED]:
 Subject: Re: Open MPI rpmbuild fails in OFED-1.2
 
 On Feb 7, 2007, at 2:49 PM, Michael S. Tsirkin wrote:
 
  My $0.02: This is another in a growing list of issues reflecting the
  whole build everything in DESTDIR is a problematic approach.
 
  I don't know much about RPM, and I am not exactly sure why are
  our source RPMs so complicated.
 
 It's a combination of two things:
 
 1) similar to what you said below, we have lots of software packages  
 that are all dependent upon each other
 -- this leads to a conglomeration of rpath's and shared library  
 dependencies that are incorrect
 
 2) we're trying to *use* the software when it is installed in the  
 DESTDIR
 -- this means that you have to put special-case in the software so  
 that they look for support files in both the DESTDIR *and* the final  
 installation directory

How do you mean, use?

Hmm. I guess my question is - this works fine when I run OFED's
configure script, why is SRPM so much more difficult?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] IB/ipoib_cm: fix up issues from code review

2007-02-08 Thread Michael S. Tsirkin
The following lightly tested patch addresses Roland's comments on IPoIB CM.
Applies on top of PATCHv6:

- Randomise RQ PSN
- Fix for modular IPv6
- MTU endian-ness fix for ICMPs
- Cosmetics

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

Roland, do you want me to report the full fixed-up patch instead?

Pls let me know when IPoIB CM is in for-2.6.21,
I'll switch to that for my testing.

diff --git a/drivers/infiniband/ulp/ipoib/Kconfig 
b/drivers/infiniband/ulp/ipoib/Kconfig
index 0ffca11..af78ccc 100644
--- a/drivers/infiniband/ulp/ipoib/Kconfig
+++ b/drivers/infiniband/ulp/ipoib/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_IPOIB
tristate IP-over-InfiniBand
-   depends on INFINIBAND  NETDEVICES  INET
+   depends on INFINIBAND  NETDEVICES  INET  (IPV6 || IPV6=n)
---help---
  Support for the IP-over-InfiniBand protocol (IPoIB). This
  transports IP packets over InfiniBand so you can use your IB
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 8082d50..eb885ee 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -127,7 +127,6 @@ struct ipoib_tx_buf {
u64 mapping;
 };
 
-#ifdef CONFIG_INFINIBAND_IPOIB_CM
 struct ib_cm_id;
 
 struct ipoib_cm_data {
@@ -181,7 +180,6 @@ struct ipoib_cm_dev_priv {
struct ib_recv_wr   rx_wr;
 };
 
-#endif
 /*
  * Device private locking: tx_lock protects members used in TX fast
  * path (and we use LLTX so upper layers don't do extra locking).
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index e7e7cc0..8ee6f06 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -37,7 +37,7 @@
 #include net/dst.h
 #include net/icmp.h
 
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 #include linux/icmpv6.h
 #endif
 
@@ -170,7 +170,8 @@ static struct ib_qp *ipoib_cm_create_rx_qp(struct 
net_device *dev,
 }
 
 static int ipoib_cm_modify_rx_qp(struct net_device *dev,
- struct ib_cm_id *cm_id, struct ib_qp *qp)
+ struct ib_cm_id *cm_id, struct ib_qp *qp,
+ unsigned psn)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ib_qp_attr qp_attr;
@@ -193,7 +194,7 @@ static int ipoib_cm_modify_rx_qp(struct net_device *dev,
ipoib_warn(priv, failed to init QP attr for RTR: %d\n, ret);
return ret;
}
-   qp_attr.rq_psn = 0 /* FIXME */;
+   qp_attr.rq_psn = psn;
ret = ib_modify_qp(qp, qp_attr, qp_attr_mask);
if (ret) {
ipoib_warn(priv, failed to modify QP to RTR: %d\n, ret);
@@ -203,7 +204,8 @@ static int ipoib_cm_modify_rx_qp(struct net_device *dev,
 }
 
 static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id,
-struct ib_qp *qp, struct ib_cm_req_event_param 
*req)
+struct ib_qp *qp, struct ib_cm_req_event_param 
*req,
+unsigned psn)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ipoib_cm_data data = {};
@@ -219,7 +221,7 @@ static int ipoib_cm_send_rep(struct net_device *dev, struct 
ib_cm_id *cm_id,
rep.target_ack_delay = 20; /* FIXME */
rep.srq = 1;
rep.qp_num = qp-qp_num;
-   rep.starting_psn = 0 /* FIXME */;
+   rep.starting_psn = psn;
return ib_send_cm_rep(cm_id, rep);
 }
 
@@ -229,6 +231,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, 
struct ib_cm_event *even
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ipoib_cm_rx *p;
unsigned long flags;
+   unsigned psn;
int ret;
 
ipoib_dbg(priv, REQ arrived\n);
@@ -243,11 +246,12 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, 
struct ib_cm_event *even
goto err_qp;
}
 
-   ret = ipoib_cm_modify_rx_qp(dev, cm_id, p-qp);
+   psn = random32()  0xff;
+   ret = ipoib_cm_modify_rx_qp(dev, cm_id, p-qp, psn);
if (ret)
goto err_modify;
 
-   ret = ipoib_cm_send_rep(dev, cm_id, p-qp, event-param.req_rcvd);
+   ret = ipoib_cm_send_rep(dev, cm_id, p-qp, event-param.req_rcvd, psn);
if (ret) {
ipoib_warn(priv, failed to send REP: %d\n, ret);
goto err_rep;
@@ -742,7 +746,7 @@ static int ipoib_cm_send_req(struct net_device *dev,
req.retry_count   = 0; /* RFC draft warns against retries */
req.rnr_retry_count   = 0; /* RFC draft warns against retries */
req.max_cm_retries= 15;
-   req.srq   = 15;
+   req.srq   = 1;
return ib_send_cm_req(id, req);
 }
 
@@ -1041,7 +1045,7 @@ static void ipoib_cm_skb_reap(struct work_struct *work

Re: [openib-general] more comments on cxgb3

2007-02-08 Thread Michael S. Tsirkin
  - It seems that by passing in huge resource sizes, userspace will be able to
drink up unlimited amounts of kernel memory.
mthca handles this by using the mlock rlimit, should something be done 
  here
as well?
 
 Hmm.  That's a good point.  I'll put this on the todo as well.  So mthca
 adds to process's rlimit value as things are allocated out of kernel
 memory (or maybe even anything that gets pinned)?

Yes. The code is actually in uverbs core, mthca uses that.

  - I wonder about the names like get_mhp - what does mhp mean?
  static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid)
  {
  return idr_find(rhp-mmidr, mmid);
  }
  
 
 Memory Handle Pointer.

hmm, what's a Handle? Maybe a better name can be found.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]]

2007-02-08 Thread Michael S. Tsirkin
 Well, the way I see it one can take code from the Linux part under the BSD
 licance and use it in The windows part. The otherway around seems fine to me 
 but
 some say that since the windows BSD liscance Reqires that some text will 
 always
 remain there, the other way around is not possibale. As I'm not an Expert in
 that erea I don't know who is right.

Interesting. Where does this idea come from? AFAIK BSD license is well known to 
be
GPL-compatible, so there should be no problem moving code in either direction.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]]

2007-02-08 Thread Michael S. Tsirkin
 Quoting r. Michael S. Tsirkin [EMAIL PROTECTED]:
 Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: 
 syslog() fixes]]
 
  Well, the way I see it one can take code from the Linux part under the BSD
  licance and use it in The windows part. The otherway around seems fine to 
  me but
  some say that since the windows BSD liscance Reqires that some text will 
  always
  remain there, the other way around is not possibale. As I'm not an Expert in
  that erea I don't know who is right.
 
 Interesting. Where does this idea come from?

Maybe this?
http://www.gnu.org/philosophy/bsd.html
Note that openib license does not include the advertising clause.

 AFAIK BSD license is well known to be
 GPL-compatible, so there should be no problem moving code in either direction.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/ipoib_cm: fix up issues from code review

2007-02-08 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH] IB/ipoib_cm: fix up issues from code review
 
 OK, I pulled this in and fixed it to build with the netdevice
 class_device-ectomy that just went upstream, and pushed it out on my
 for-2.6.21 branch like this.

Thanks!

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes

2007-02-08 Thread Michael S. Tsirkin
  Also I agree with MST, I would like to see the core/ subdirectory die
  completely.
  
 
 ok ok...I'll kill the subdir...

It's not just the directory BTW. Stuff like building completions in
t3_cqe format and then reformatting to ib_wc seems to be much more confusing
(and some of it is actually on datapath).
Same goes for t3_wq and I suspect everything else defined in cxio_wr.h -
please, use the native types from include/rdma/.

Having to wade through 3 driver-specific layers of abstractions just because I 
want to
for example change API in ib_verbs.h and need to update all drivers will be
very taxing. I understand your design calls for 2 layers, but at least the API 
exposed
by code in drivers/net is fairly small, while cxio_wr.h declares 27 structures
which seem to just duplicate ib_verbs.h.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport-IWCM workaroundfor ip_dev_find() bug.

2007-02-08 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport-IWCM 
 workaroundfor ip_dev_find() bug.
 
 Michael, 
 
 From your email, it sounded like you would regression test this.

Not yet, we had lab restructuring - hopefully next week.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes

2007-02-08 Thread Michael S. Tsirkin
 And here's what I have pending in for-2.6.21 so far:

What about the mthca memory registration patches?
I thought they are on their way. Should I repost?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour

2007-02-07 Thread Michael S. Tsirkin
 Quoting Moni Shoua [EMAIL PROTECTED]:
 Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of 
 linux neighbour
 
 
  Another concern: assume that one device goes away (e.g. hotplug).
  It seems that neighbours whose dev field point to another device, will not 
  be destroyed.
  Correct?

 I agree.

  Therefore in your design, it seems that to_ipoib_neigh()-dev
  will get us a pointer to device that has been removed already.
  
 I agree that this is a problem.

I think we can solve this if we track all ipoib neighbours, like we do for old 
kernels,
and then flush ipoib neighbours on any hotplug event.
Roland, does this sound too awful?

 It think it would be best to prevent an IPoIB device
 from disappearing or from ib_ipoib from being unloaded as long as IPoIB
 device is a slave. Unfortunately, I don't see how this can be done just
 by fixing something in bonding or IPoIB. 

So hotplug is blocked potentially forever?
This does not sound good.

 However, any slave knows he has a master (dev-master). 
 What do you think about a solution where IPoIB first tries to clean up the
 neighbours that belong to it's master before deleting the IPoIB device?

How?

  Furthermore, bond_setup_by_slave is called only for non
  Ethernet devices (we consider to change the logic to called only for
  IPoIB devices just for safety).
  
  Why is this necessary, BTW?
  
 If we don't do that, we get a memory leak because the neigh destructor will
 never be called for non IPoIB devices although they carry ipoib_neigh
 with them.

How can this happen? If it does, I think we are back to where we started:
to_ipoib_neigh is broken for non-IPoIB device.
I thought you said only devices of the same type can be paired?


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support

2007-02-07 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCHv6 RFC] IPoIB CM Experimental support
 
   Well, randomness is a resource after all, and since we don't have the 
 additional
   security provided by PSNs in IPoIB UD, it seemed we do not need it for
   IPoIB CM either. So maybe the right thing is just to remove the FIXME 
 comment.
 
 random32() doesn't use up any entropy. Random PSNs help avoid problems
 with stale connections, so I think we should do it.

Well, stale connections don't pose any real problems for IPoIB CM - worst case a
connnection is torn down and recreated.  But I don't have a strong opinion
anyway - that's why I put the FIXME there. So I'm OK with random32, too.

 I noticed some funny code in ipoib_cm_skb_reap():
 
   __be32 mtu = cpu_to_be32(priv-mcast_mtu);
 
 // htonl(__be32)??
   icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, 
 htonl(mtu));
 // no htonl() here -- is this correct?
   icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);
 
 what is the right thing?

Both are right I think.
These two functions seem to accept parameters in different format:

include/net/icmp.h:extern void  icmp_send(struct sk_buff *skb_in,  int type, int
  code, __be32 info);


include/linux/icmpv6.h:extern voidicmpv6_send(struct sk_buff 
*skb,
include/linux/icmpv6.h-   int type, int 
code,
include/linux/icmpv6.h-   __u32 info,
include/linux/icmpv6.h-   struct net_device 
*dev);

BTW, I just looked at ip_gre.c and it has the same code.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Open MPI rpmbuild fails in OFED-1.2

2007-02-07 Thread Michael S. Tsirkin
 Quoting Jeff Squyres [EMAIL PROTECTED]:
 Subject: Re: Open MPI rpmbuild fails in OFED-1.2
 
 My $0.02: This is another in a growing list of issues reflecting the  
 whole build everything in DESTDIR is a problematic approach.

I don't know much about RPM, and I am not exactly sure why are
our source RPMs so complicated.

However, with the plan configure/make we are able to
build all openfabrics components within build directory,
without any chroot tricks.

So let's not give up yet, IMO it is very nice to be able to build in
standard environment, without being root.

Note that what is biting us here is mostly the large number of modules:
simple single-module packages don't have this problem - and this
is really a design decision we took.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support

2007-02-07 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCHv6 RFC] IPoIB CM Experimental support
 
I noticed some funny code in ipoib_cm_skb_reap():

 __be32 mtu = cpu_to_be32(priv-mcast_mtu);

// htonl(__be32)??
 icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, 
 htonl(mtu));
// no htonl() here -- is this correct?
 icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);

what is the right thing?
   
   Both are right I think.
 
 You're right -- the mistake is making mtu __be32 and preswapping it.
 I'll fix it up in my tree.

Let me know when you push it out, I'll start testing it.

   These two functions seem to accept parameters in different format:
   
   include/net/icmp.h:extern void  icmp_send(struct sk_buff *skb_in,  int 
 type, int
code, __be32 info);
   
   
   include/linux/icmpv6.h:extern voidicmpv6_send(struct 
 sk_buff *skb,
   include/linux/icmpv6.h-   int type, 
 int code,
   include/linux/icmpv6.h-   __u32 info,
   include/linux/icmpv6.h-   struct 
 net_device *dev);
   
   BTW, I just looked at ip_gre.c and it has the same code.
 
 no, it leaves mtu as an int rather than swapping it.

You are right of course. sparse would have found it.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] RFC ofed 1 2 kernel file structure

2007-02-07 Thread Michael S. Tsirkin
Repost. Could everyone please look at
git://git.openfabrics.org/~mst/newofed.git
and tell me whether this looks acceptable?

Thanks,
MST

Quoting r. Michael S. Tsirkin [EMAIL PROTECTED]:
Subject: Re: idea for ofed 1 2 kernel file structure

 Quoting  Michael S. Tsirkin [EMAIL PROTECTED]:
 It would easy to split OFED specific files In separate directory and have OFED
 
 All out of tree modules we distribute would go there too.
 
 What do others think about this?

OK, I didn't quite get whether the majority likes this or not,
so I created such a repository, extracted the ofed specific history
and imported it there.

Take a look here:
git://git.openfabrics.org/~mst/newofed.git

Build scripts will have to be adjusted to add
necessary kernel components that we use.

Another nice thing about this layout, is that users (if they so wish)
will be able to use just linux kernel source tarball instead of full linux
kernel git.

OFED maintainers, you are the primary users of the OFED git.
Please comment which layout is better for you.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RFC ofed 1 2 kernel file structure

2007-02-07 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: Re: [openib-general] RFC ofed 1 2 kernel file structure
 
 Michael S. Tsirkin wrote:
  Repost. Could everyone please look at
  git://git.openfabrics.org/~mst/newofed.git
  and tell me whether this looks acceptable?
 
 I don't see anything listed for this off of the web site, and cloning it 
 produces an empty tree.

Pls try again now.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RFC ofed 1 2 kernel file structure

2007-02-07 Thread Michael S. Tsirkin
 Quoting r. Hoang-Nam Nguyen [EMAIL PROTECTED]:
 Subject: Re: [openib-general] RFC ofed 1 2 kernel file structure
 
 Hi Michael,
  Repost. Could everyone please look at
  git://git.openfabrics.org/~mst/newofed.git
  and tell me whether this looks acceptable?
 I could clone it:
 $git clone git://git.openfabrics.org/~mst/newofed.git
 fatal: Unable to look up git.openfabrics.org (Temporary failure in name
 resolution)
 fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed.
 $git clone git://git.openfabrics.org/~mst/newofed.git
 fatal: Unable to look up git.openfabrics.org (Temporary failure in name
 resolution)
 fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed.
 
 I tried to use web git pointing to
 http://www.openfabrics.org/git/?p=~mst/newofed.git;a=tree
 and got this:
 403 Forbidden - Reading tree failed
 
 Is there something else I need to pay attention of?

Pls try again.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RFC ofed 1 2 kernel file structure

2007-02-07 Thread Michael S. Tsirkin
 Quoting Bryan O'Sullivan [EMAIL PROTECTED]:
 Subject: Re: RFC ofed 1 2 kernel file structure
 
 Michael S. Tsirkin wrote:
 
  All, pls try now.
 
 This is similar in layout to the sort of tree we've used internally all 
 along, so it's fine by me.  One small problem: I don't like the 
 combination of lower and upper case names of makefile and Makefile in 
 the top-level directory.

ofed_1_2 has the same.

 Also, it's no longer obvious to me to tell what kernel version the 
 sources are pulled from.  I used to be able to check the top-level 
 Makefile or git history, but I no longer know what to look at.

This will be part of BUILD_ID.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] more comments on cxgb3

2007-02-07 Thread Michael S. Tsirkin
OK, so I looked at cxgb3 some more.
To summarise my previous comments, I think the cxio hal layer needs to go to
make the code readable - if I understand correctly it is there for historical
reasons only.

I started looking at userspace/kernel interaction, and then
went over to other code under cxgb3 (but not core/).

- Consider a user that does e.g. create QP, but never calls mmap.
  Is there some code that will clean out the unclamed mmap object?
  I couldn't find it, and iwch_dealloc_ucontext does not seem to
  do anything with it.

- Passing physical address to userspace and back looks suspicios.
  Especially this:
uresp.physaddr = virt_to_phys(chp-cq.queue);
  Could you elaborate on the design here? What are these phy addresses
  and how come userspace needs to know the phy address?
  You are not doing DMA by this address, by any chance?

- It seems that by passing in huge resource sizes, userspace will be able to
  drink up unlimited amounts of kernel memory.
  mthca handles this by using the mlock rlimit, should something be done here
  as well?
 
A couple of comments on PDBG macro.
- I'd like to suggest following the practice of prefixing macro names with 
module name
  (same goes for functions like get_mhp really) - unless they are local to file.

- You are using __FUNCTION__ a lot - it might be to just to kill it,
  messages are unique so you'll be able to locate the msg source anyway,
  save some kernel text and logs will be shorter. In any case I think
  __func__ is the recommended gcc way to get the name currently.

- comment near pr_debug definition in include/linux/kernel.h says:
/* If you are writing a driver, please use dev_dbg instead */
  so it might be a good idea for PDBG to follow this rule.

- log messages do not look very informative to me.
  I also think they are a bit too many of them currently.
  For example, I do not think it is a good idea to print
  the kernel pointers out.

  For example, in code like the following:
mhp = get_mhp(rhp, (sg_list[i].lkey)  8);
if (!mhp) {
PDBG(%s %d\n, __FUNCTION__, __LINE__);
return -EIO;
}

  might be better to say 
  MR key XXX does not exist. Exiting..
  -EIO also looks like a strange error code to return here, does it not?
  Maybe something like EINVAL would be more appropriate?

- I wonder about the names like get_mhp - what does mhp mean?
static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid)
{
return idr_find(rhp-mmidr, mmid);
}

Looks like it looks up an mr. Is that right? Maybe the name shouldbe changed
to convey this meaning.

- In the following code, what does missing pdid check mean?
/*
 * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now.
 */
This sounds interesting.
Does this mean the code does not validate the PD currently?

I have the same question for:
/* TBD: check perms */
in iwch_bind_mw.

BTW, does TBD stand for To Be Done here?
google says:
Definitions of TBD on the Web:

* To Be Determined, Defined, Decided.
  www.csr.com/ptot.htm

* to be determined
  
www.liberalsagainstterrorism.com/wiki/index.php/Counterinsurgency_Operations/Glossary

* Treasury Board (Secretariat)
  www.psc-cfp.gc.ca/centres/definitions_and_notes_e.htm

* The three letter abbreviation TBD may be/mean, depending on context: * an 
 acronym for To Be Determined (...at a later point in time., typically)* 
the Douglas Devastator, a US Navy torpedo bomber of World War II
  en.wikipedia.org/wiki/TBD

What is to be determined here?
Do you mean TODO really?

- iwch_sgl2pbl_map is used in several places, and seems a bit too big to be 
inline

Well, it's time to go do my day job now :)

Hope this helps,

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Backport and fix patches for ipath driver

2007-02-06 Thread Michael S. Tsirkin
 Quoting  Bryan O'Sullivan [EMAIL PROTECTED]:
 Subject: Backport and fix patches for ipath driver
 
 Hi, Vlad and Tziporet -
 
 Here's a round of fix and backport patches for the ipath driver, for 
 dropping into the OFED 1.2 tree.  The way in which they're organised 
 should, I hope, be clear.

Looks good, fixes look much cleaner than what we had for OFED 1.1.
I think fixes can be applied already.
However, I'm not sure the backports are ready to be applied as is yet.

Just taking a look at random:

./backport/2.6.18/ipath-50-mad-kmem_cache-2.6.19.patch
BACKPORT - kmem_cache_t disappeared after 2.6.19
diff -r a290ff6e9ae7 drivers/infiniband/core/mad.c
--- a/drivers/infiniband/core/mad.c Wed Jan 31 14:47:02 2007 -0800
+++ b/drivers/infiniband/core/mad.c Wed Jan 31 14:48:00 2007 -0800
@@ -46,7 +46,7 @@ MODULE_AUTHOR(Hal Rosenstock);
 MODULE_AUTHOR(Hal Rosenstock);
 MODULE_AUTHOR(Sean Hefty);

-static struct kmem_cache *ib_mad_cache;
+static kmem_cache_t *ib_mad_cache;

 static struct list_head ib_mad_port_list;
 static u32 ib_mad_client_id = 0;

This changes a core file, and does not seem to be related to ipath at all.
What problem does this solve?  I note that mad.c already seems to build fine on 
2.6.18
for us - this is part of daily build.

Another example that looks strange:

BACKPORT - workqueues changed in 2.6.20

diff -r 8b94fcef1edd drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cThu Feb 01 08:54:29 
2007 -0800
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cThu Feb 01 08:57:19 
2007 -0800
@@ -241,7 +241,7 @@ static struct ipath_devdata *ipath_alloc
dd-pcidev = pdev;
pci_set_drvdata(pdev, dd);

-   INIT_DELAYED_WORK(dd-link_work, check_link_status);
+   INIT_WORK(dd-link_work, check_link_status);

list_add(dd-ipath_list, ipath_dev_list);


INIT_DELAYED_WORK is implemented in kernel_addons, so this should
not be necessary.

@@ -725,6 +725,7 @@ static void __devexit ipath_remove_one(s
 */
ipath_shutdown_device(dd);

+#undef cancel_delayed_work
cancel_delayed_work(dd-link_work);
flush_scheduled_work();

This undef looks quite ugly. What does it do?

Please go over the backport patches and check whether they are really necessary.
I think you will mostly discover that the kernel_addons mechanism makes
the backport patches unnecessary. If not, you should try adding
things under kernel_addons as first choice so that everyone benefits.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] QoS in opensm will not be part of OFED 1.2

2007-02-06 Thread Michael S. Tsirkin
   Quoting Michael S. Tsirkin [EMAIL PROTECTED]:
   Subject: Re: QoS in opensm will not be part of OFED 1.2
   
   I had an AI to check the QoS status with OSM.
   Conclusions are that QoS support in OpenSM will not be part of 
   OFED 1.2 
   (I updated the plan on the Wiki)
   
   The reasons for this are:
   1. Code not ready at code freeze.
   2. There are technical discussion in the list regarding some 
  implementation details (e.g. XML or text syntax).
   3. SPEC is not published by IBTA yet.
  
  I think this last reason also applies to the end client QoS changes 
  as
  well.
 
 Yes. But the other 2 don't.

Right but I think that precludes it from being included in OFED right
now.
   
   Since the code is already included in OFED, moving it out would violate 
   the feature
   freeze rules, unless there's an actual bug this would fix.
  
  OTOH, you are right in that without SM support we can't claim to have this
  feature at all. So, to avoid controversy, I have just removed the QoS 
  patches
  from IB core and pushed the code out.
 
 I think that the mthca patch to encode SL in sched_queue field to
 improve hardware QoS guarantees for connected QPs is useful as this can
 be exercised by IPoIB-CM. If so, should/can this be included ?

OK. Note this is still untested, and off by default.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] idea for ofed 1 2 kernel file structure

2007-02-06 Thread Michael S. Tsirkin
 Quoting  Michael S. Tsirkin [EMAIL PROTECTED]:
 Subject: idea for ofed 1 2 kernel file structure
 
 Hi!
 
 I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
 
 It is hard to see changes that are specific to OFED since we have whole kernel
 history mixed in.
 
  
 
 It would easy to split OFED specific files In separate directory and have OFED
 scripts combine that with upstream kernel.
 
  
 
 All out of tree modules we distribute would go there too.
 
 What do others think about this?

OK, I didn't quite get whether the majority likes this or not,
so I created such a repository, extracted the ofed specific history
and imported it there.

Take a look here:
git://git.openfabrics.org/~mst/newofed.git

Build scripts will have to be adjusted to add
necessary kernel components that we use.

Another nice thing about this layout, is that users (if they so wish)
will be able to use just linux kernel source tarball instead of full linux
kernel git.

OFED maintainers, you are the primary users of the OFED git.
Please comment which layout is better for you.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour

2007-02-06 Thread Michael S. Tsirkin
 --
 IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
 whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
 created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour)
 call.
 
 When using the bonding driver, neighbours are created by the net stack on 
 behalf
 of the bonding (master) device. On the tx flow the bonding code gets an skb 
 such
 that skb-dev points to the master device, it changes this skb to point on the
 slave device and calls the slave hard_start_xmit function.
 
 Combing these two flows, there is a hole if some code at ipoib
 (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, 
 n-dev
 is an ipoib device so for example netdev_priv(n-dev) would be of type struct
 ipoib_dev_priv.

Could you plese elaborate how ipoib_neigh_destructor comes to be called at all?
At what point does ipoib_neigh_setup_dev get called?

 To fix it, this patch adds a dev field to struct ipoib_neigh which is used
 instead of the struct neighbour dev one.

What I am concerned with is - if the master is not an IPoIB device,
what guarantee do we have that to_ipoib_neigh will return 0
and not part of an actual hardware address?

Without bonding, the reason is that dev points to an ipoib device,
so we know hw address is 20 bytes.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug.

2007-02-06 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for 
 ip_dev_find() bug.
 
  Actually, yes it does.  Here's one case (that I just tested :):
  
  If you rdma_bind() to an explicit address local address, it will fail.
  
  Foo!
  
  I guess I'll need to address the uses of ip_dev_find() in addr.c as well
  before we commit this.
 
 Can we just backport our own version of ip_dev_find()?  We had this once 
 before 
 in svn when they removed it from being exported from the kernel.

Yes, this is in kernel_addons for 2.6.19 or something like that.
Just copy from there, much cleaner than the patch.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug.

2007-02-06 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport 
 -IWCM workaround for ip_dev_find() bug.
 

Can we just backport our own version of ip_dev_find()?  We had this 
once before 
in svn when they removed it from being exported from the kernel.
   
   Yes, this is in kernel_addons for 2.6.19 or something like that.
   Just copy from there, much cleaner than the patch.
 
  
  I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find
  for sles9sp3.  So maybe this function is causing the error.  Stay tuned.
 
 xxx_ip_dev_find() is returning the wrong interface (sometimes).  I added
 printks to xxx_ip_dev_find().  Then I ran rping -s -a local ip addr
 and it failed because xxx_ip_dev_find() returned loopback instead of my
 eth device.  
 
 Here is the function with printks:
 
 static inline struct net_device *xxx_ip_dev_find(u32 addr)
 {
 struct net_device *dev;
 u32 ip;
 
 read_lock(dev_base_lock);
 printk(%s looking for dev with addr %x\n, __FUNCTION__, addr);
 for (dev = dev_base; dev; dev = dev-next) {
 ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
 printk(%s dev %p name %s ipaddr %x\n, __FUNCTION__,
 dev, dev-name, ip);
 if (ip == addr) {
 dev_hold(dev);
 break;
 }
 }
 read_unlock(dev_base_lock);
 
 return dev;
 }
 
 
 Here is the printk log showing loopback being returned:
 
 xxx_ip_dev_find looking for dev with addr 8846a8c0
 xxx_ip_dev_find dev 804000e0 name lo ipaddr 8846a8c0
 
 The address bound to eth3 is 192.168.70.136 (0xc0a84688).  For some
 reason, this line:
 
 ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
 
 Returns the 192.168.70.136 address for device-name == lo.
 
 Riddle me that!
 
 Also, sometimes it works ok because the loopback interface gets some
 other ip address that is assigned to the local system as opposed to my
 rdma address.  For example, I booted up the sles9sp3 system with a
 rebuilt kernel and no ofed modules installed.  The system gets
 10.10.0.136 via DHCP for its public interface.  I then built the ofed
 modules and installed them.  I then loaded them and configured my rnic
 interface with 192.168.70.136.  I ran rping and bound to the local
 ipaddr and it worked.  The log showed that inet_select_addr() returned
 10.10.0.136 for loopback and thus xxx_ip_dev_find() continued walking
 the list and found the correct ethernet interface.  I then rebooted and
 ran the test again and it failed.  So somehow module load order affects
 this, I think.
 
 g.


Try copying inet_select_addr source in from some upstream kernel,
look at that.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug.

2007-02-06 Thread Michael S. Tsirkin
 How shall I fix this?  

Patch?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug.

2007-02-06 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor 
 ip_dev_find() bug.
 
 On Tue, 2007-02-06 at 23:36 +0200, Michael S. Tsirkin wrote:
   How shall I fix this?  
  
  Patch?
  
 
 Riiight.  I'm afraid if I use HOST instead of LINK that I'll break some
 strange SDP loopback feature or some such thing.  And I'm not in a
 position to test that.
 
 But I can post a patch.  Shall I just change sles9sp3 since we don't see
 (yet) any problems with the other distros?

If you post one that updates all kernels it will be easier to test.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug.

2007-02-06 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor 
 ip_dev_find() bug.
 
 Here it is (only tested with rping over iWARP on sles9sp3):
 
 
 
 
 xxx_ip_dev_find() must use scope HOST.
 
 From: Steve Wise [EMAIL PROTECTED]
 
 Function xxx_ip_dev_find(RT_SCOPE_LINK) returns the wrong interface on
 some kernels.  The correct scope is RT_SCOPE_HOST.
 
 Signed-off-by: Steve Wise [EMAIL PROTECTED]

OK. I don't have access to the lab at the moment, but hope to test this
by next week.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



  1   2   3   4   5   6   7   8   9   10   >