Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering

2007-02-27 Thread Roland Dreier
  I just gave this a cursory glance.

I haven't really read it except to think why is this so complicated?

  A suggestion: would it not be much simpler to modify the QP from RTS to RTS 
  on pkey
  change?

Changing the P_Key index is not allowed for RTS-RTS.  You would have
to modify the QP RTS-SQD, wait for the SQ to drain, then modify the
P_Key index with SQD-SQD, and finally go SQD-RTS.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.

2007-02-27 Thread Roland Dreier
 I did a short code review of the ipoib code concentrating on
  partitioning support and I mentioned that the asynchronous events
  handler in the ipoib code does not take the port number reported in
  the event record into consideration. The effect of that is that all of
  the ib# devices related to that specific HCA are flushed when it seems
  to me that only the relevant port one should be. Is that done on
  purpose, or am I missing something ?

I don't think there's any particular reason the code is that way
except for the oversight never being corrected.  But it looks trivial
to fix, like the patch below.  Does that look right to you?

  p.s. I'm working on a patch that should solve another issue caused by
  PKEY reordering  ipoib behavior and the above issue further
  complicates things for me.

Why not fix the issue first then?

commit a27cbe878203076247c1b5287f5ab59ed143b560
Author: Roland Dreier [EMAIL PROTECTED]
Date:   Tue Feb 27 07:37:49 2007 -0800

IPoIB: Only handle async events for one port

An asynchronous event carries the port number that the event occurred
on, so there's no reason for an IPoIB interface to process an event
associated with a different local HCA port.

Signed-off-by: Roland Dreier [EMAIL PROTECTED]

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 3cb551b..7f3ec20 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler,
struct ipoib_dev_priv *priv =
container_of(handler, struct ipoib_dev_priv, event_handler);
 
-   if (record-event == IB_EVENT_PORT_ERR||
-   record-event == IB_EVENT_PKEY_CHANGE ||
-   record-event == IB_EVENT_PORT_ACTIVE ||
-   record-event == IB_EVENT_LID_CHANGE  ||
-   record-event == IB_EVENT_SM_CHANGE   ||
-   record-event == IB_EVENT_CLIENT_REREGISTER) {
+   if ((record-event == IB_EVENT_PORT_ERR||
+record-event == IB_EVENT_PKEY_CHANGE ||
+record-event == IB_EVENT_PORT_ACTIVE ||
+record-event == IB_EVENT_LID_CHANGE  ||
+record-event == IB_EVENT_SM_CHANGE   ||
+record-event == IB_EVENT_CLIENT_REREGISTER) 
+   record-element.port_num == priv-port) {
ipoib_dbg(priv, Port state change event\n);
queue_work(ipoib_workqueue, priv-flush_task);
}

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering

2007-02-27 Thread Roland Dreier
   I haven't really read it except to think why is this so complicated?
  
  Do you refer to that complication of the patch of the issue ?

the patch.

   Changing the P_Key index is not allowed for RTS-RTS.  You would have
   to modify the QP RTS-SQD, wait for the SQ to drain, then modify the
   P_Key index with SQD-SQD, and finally go SQD-RTS.
  
  Do you think that using that way to solve it will be a significant
  simplification ? We'll still have to reuse that handling for missed
  completion that is currently implemented in ipoib_ib_dev_stop and
  still have additional work element.

no, I don't think SQD is really useful in practice.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.

2007-02-27 Thread Roland Dreier
  On a second thought based on the fact that on a two port HCA we'll
  have a 50% miss on the events being delivered, I would move the new
  condition to be evaluated first. I apologize if this is too much of
  micro optimization. What do you think ?

That wouldn't really be correct since element.port_num isn't valid
unless we already know it's a port-related event.

And it's not worth worrying about this since it's not remotely a hot path.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-27 Thread Roland Dreier
On our cell blade + PCI-e Mellanox.
  
  I don't see anything in arch/powerpc that looks like
  dma_alloc_coherent() will do anything other than allocate some memory
  and map it with DMA_BIDIRECTIONAL.  So how does this altix fix help in
  your situation?  Am I misreading the Cell IOMMU code?

Shirley, can you clarify why doing dma_alloc_coherent() in the kernel
helps on your Cell blade?  It really seems that dma_alloc_coherent()
just allocates some memory and then does dma_map(DMA_BIDIRECTIONAL),
which would be exactly the same as allocating the CQ buffer in
userspace and using ib_umem_get() to map it into the kernel.

I'm looking at a possibly cleaner solution to the Altix issue, so I
would like to make sure it fixes whatever the bug on Cell is as well.
So any details you can provide about the problem you see on Cell would
help a lot.

Thanks...

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish

2007-02-27 Thread Roland Dreier
I don't think this applies any more since Sean's multicast stuff was
merged.  I didn't realize you wanted to get this merged upstream --
anyway, can you please regenerate the patch against the latest kernel?

Thanks

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-27 Thread Roland Dreier
  So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can
  generate the patch for all ULPs to use this for review. Do you need me to
  do that?

No, it's not in OFED 1.2 or the upstream kernel.  And no one has
implemented it for userspace (and I'm somewhat reluctant to break the
ABI at this point without some performance numbers to motivate making
this API change).

Have the NAPI performance problems with ehca been resolved?  We could
probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
kernel changes at least.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-26 Thread Roland Dreier
nope, doesn't seem to make a difference.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [GIT PULL] please pull infiniband.git

2007-02-26 Thread Roland Dreier
Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

This will get various post-rc1 cleanups and fixes:

Adrian Bunk (2):
  IB/mthca: Make 2 functions static
  RDMA/cxgb3: cleanups

Michael S. Tsirkin (1):
  IPoIB/cm: Improve small message bandwidth

Roland Dreier (3):
  IPoIB: Remove unused local_rate tracking
  IB/uverbs: Return correct error for invalid PD in register MR
  IPoIB: Correct debugging output when path record lookup fails

Sean Hefty (4):
  IB/core: Set hop limit in ib_init_ah_from_wc correctly
  RDMA/cma: Request reversible paths only
  IB/cm: Remove ca_guid from cm_device structure
  RDMA/cma: Remove unused node_guid from cma_device structure

Steve Wise (1):
  RDMA/cxgb3: Stop the EP Timer on BAD CLOSE

 drivers/infiniband/core/cm.c   |   10 ++---
 drivers/infiniband/core/cma.c  |6 ++--
 drivers/infiniband/core/uverbs_cmd.c   |4 ++-
 drivers/infiniband/core/verbs.c|2 +-
 drivers/infiniband/hw/cxgb3/Makefile   |1 -
 drivers/infiniband/hw/cxgb3/cxio_hal.c |   31 +---
 drivers/infiniband/hw/cxgb3/cxio_hal.h |5 ---
 drivers/infiniband/hw/cxgb3/cxio_resource.c|   14 +--
 drivers/infiniband/hw/cxgb3/iwch_cm.c  |6 ++--
 drivers/infiniband/hw/cxgb3/iwch_provider.c|2 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.h|1 -
 drivers/infiniband/hw/cxgb3/iwch_qp.c  |   29 +++
 drivers/infiniband/hw/mthca/mthca_mr.c |   10 +++--
 drivers/infiniband/ulp/ipoib/ipoib.h   |1 -
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|   46 ++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |8 ++---
 17 files changed, 76 insertions(+), 102 deletions(-)


diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d446998..842cd0b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -88,7 +88,6 @@ struct cm_port {
 struct cm_device {
struct list_head list;
struct ib_device *device;
-   __be64 ca_guid;
struct cm_port port[0];
 };
 
@@ -739,8 +738,8 @@ retest:
ib_cancel_mad(cm_id_priv-av.port-mad_agent, cm_id_priv-msg);
spin_unlock_irqrestore(cm_id_priv-lock, flags);
ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT,
-  cm_id_priv-av.port-cm_dev-ca_guid,
-  sizeof cm_id_priv-av.port-cm_dev-ca_guid,
+  cm_id_priv-id.device-node_guid,
+  sizeof cm_id_priv-id.device-node_guid,
   NULL, 0);
break;
case IB_CM_REQ_RCVD:
@@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
 
req_msg-local_comm_id = cm_id_priv-id.local_id;
req_msg-service_id = param-service_id;
-   req_msg-local_ca_guid = cm_id_priv-av.port-cm_dev-ca_guid;
+   req_msg-local_ca_guid = cm_id_priv-id.device-node_guid;
cm_req_set_local_qpn(req_msg, cpu_to_be32(param-qp_num));
cm_req_set_resp_res(req_msg, param-responder_resources);
cm_req_set_init_depth(req_msg, param-initiator_depth);
@@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg,
cm_rep_set_flow_ctrl(rep_msg, param-flow_control);
cm_rep_set_rnr_retry_count(rep_msg, param-rnr_retry_count);
cm_rep_set_srq(rep_msg, param-srq);
-   rep_msg-local_ca_guid = cm_id_priv-av.port-cm_dev-ca_guid;
+   rep_msg-local_ca_guid = cm_id_priv-id.device-node_guid;
 
if (param-private_data  param-private_data_len)
memcpy(rep_msg-private_data, param-private_data,
@@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device)
return;
 
cm_dev-device = device;
-   cm_dev-ca_guid = device-node_guid;
 
set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
for (i = 1; i = device-phys_port_cnt; i++) {
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f8d69b3..d441815 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -77,7 +77,6 @@ static int next_port;
 struct cma_device {
struct list_headlist;
struct ib_device*device;
-   __be64  node_guid;
struct completion   comp;
atomic_trefcount;
struct list_headid_list;
@@ -1492,11 +1491,13 @@ static int cma_query_ib_route(struct rdma_id_private 
*id_priv, int timeout_ms,
ib_addr_get_dgid(addr, path_rec.dgid);
path_rec.pkey = cpu_to_be16(ib_addr_get_pkey

Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-26 Thread Roland Dreier
  That would be great. We hit a similar problem in our cluster test -- data
  corruption because of this race.

On what platform?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-26 Thread Roland Dreier
  On our cell blade + PCI-e Mellanox.

I don't see anything in arch/powerpc that looks like
dma_alloc_coherent() will do anything other than allocate some memory
and map it with DMA_BIDIRECTIONAL.  So how does this altix fix help in
your situation?  Am I misreading the Cell IOMMU code?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-26 Thread Roland Dreier
  Yes. It would be good to reduce number of interrupts by changing all upper
  layer protocols to use:
  
  poll CQ
  notify CQ, rotting packet notification
  poll again
  
  instead of
  notify CQ
  poll CQ
  
  If possible this can be in OFED-1.2?

No way, it's way too late at this point to change the kernel-user ABI,
let alone change all ULPs.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] failure to create an FMR mapping 1K pages on memfree

2007-02-26 Thread Roland Dreier
  I have got a report on failure to create FMR mapping 1K pages (that is
  4MB) on memfree.
 
  I don't have the exact details (ie if Arbel/Sinai / what FW  / etc)
  nor which exact check fails in
  mthca_fmr_alloc, but what's clear is that the latter function returns
  -ENOMEM when attr.max_pages is 1024 and it works fine when
  attr.max_pages is 256.
 
  Is this failure clear to you? if yes, does a HW or FW limit is being
  hit or its a driver design issue?

Is it really returning -ENOMEM?  It seems much more likely that you
are hitting the code

/* For Arbel, all MTTs must fit in the same page. */
if (mthca_is_memfree(dev) 
mr-attr.max_pages * sizeof *mr-mem.arbel.mtts  PAGE_SIZE)
return -EINVAL;

I guess you could call this limit a driver design issue.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter

2007-02-23 Thread Roland Dreier
  Newer gccs have the -fwhole-program --combine options that address
  this and more. One of the things that happens is that all internal
  functions are made 'static' and all compilation units are optimized in
  one go.

Good point... but is there any sane way to use that feature with
automake and libtool?  I know that the autotools are a pain but I
really don't want to reimplement the useful stuff they give us, and I
don't know of any really practical replacement...

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: cleanups

2007-02-23 Thread Roland Dreier
thanks, queued for 2.6.21

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2.6.21] iw_cxgb3: Stop the EP Timer on BAD CLOSE.

2007-02-23 Thread Roland Dreier
thanks, queued for 2.6.21

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v2] libibverbs: can't compile more than once due to man3 symbolic links

2007-02-22 Thread Roland Dreier
Thanks, I applied this and pushed it out.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-22 Thread Roland Dreier
  A first-cut at a patch was sent out, some very reasonable
  objections were raised, and the thread fizzled out.

Sorry, I meant to respond again, but I never got around to it.

  The biggest concern with the earlier patch seemed to be
  backward compatibility. There was a stab at addressing
  that in http://tinyurl.com/2x3s52, but no commentary.
  (Too ugly for words?)

I think you went off into the weeds there, but I'll respond to that
earlier email in detail.

  Any suggestions as to how to proceed? Should I just code
  something up in order to have a concrete target to discuss?
  Or are there any new thoughts based on the previous emails?

I actually have a vague plan for a somewhat cleaner way to get this
fix.  For a variety of reasons, I am planning on changing the way the
kernel handles memory registration so that low-level drivers have more
control over what happens.  This would allow us to folow Gleb's
suggestion to use register MR to create and map the kernel's buffer
and avoid some of the error path ugliness.  So I would prefer to map
the coherent memory that way.

However this will take a while to come to fruition, since it is kind
of a background task for me.  How severe is this issue?  In other
words, when you produced the problem, was it a synthetic test, or a
workload that someone might actually want to run?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] mthca adjust_key()

2007-02-22 Thread Roland Dreier
  Could anyone tell me why this routine in mthca is necessary?  There 
  aren't any comments to explain it; I'm wondering if this is a workaround 
  for Sinai of some kind?
  
  static inline u32 adjust_key(struct mthca_dev *dev, u32 key)
  {
  if (dev-mthca_flags  MTHCA_FLAG_SINAI_OPT)
  return ((key  20)  0x80) | (key  0x7f);
  else
  return key;
  }

It's a performance optimization for Sinai.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/BUG] libibverbs: DMA vs. CQ race

2007-02-22 Thread Roland Dreier
  Assuming that something along the lines of the previous patch
  is used, we need to address userspace/kernel compatibility.
  
  The existing abi versioning doesn't seem to be exactly what
  we want to use, though, because we want to change a verb's
  semantics to work around a bug. (Changing the abi_version
  may be an inevitable result, though.)
  
  How about adding semantic flags to the mthca_* commands
  (mthca_create_cq, etc.)? Userspace could read the contents of
  a new sysfs file which, if found, would indicate the flags
  that the kernel understands. Then it could pass the flags, if
  it chooses, to get the kernel to use the desired semantics.

This is not really the design philosophy that we've used in defining
the user-kernel interfaces for IB verbs.  Rather than having
complexity in the kernel to handle both old and new ways of doing
things, the way we've used to handle cases like this is the following:

 - specify new fixed ABI (in this case, mthca abi_version 2)
 - update library to handle old and new ABI (in this case, update
   libmthca to use mthca kernel abi 1 or 2 depending on what it
   detects at runtime)
 - update kernel to implement new ABI, and remove old ABI from kernel
   (in this case, update kernel mthca driver to abi_version 2)

The net effect of this is that updated userspace works fine with any
kernel, but updating the kernel will require updating userspace
libraries too.  However the important point is that once userspace is
updated, it's still possible to boot into old kernels and have things
work without downgrading userspace.

If we really wanted to export some flags from mthca back to libmthca,
I guess it would be possible to bump the abi version and add a flags
field to the response to the alloc_ucontext command, but in this case
I don't see a reason to worry about it.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC/BUG] DMA vs. CQ race

2007-02-22 Thread Roland Dreier
  We found this accidentally, running a normal MPI job, on a 
  normally sized machine (i.e., tens, not hundreds of 
  processors.) It appears to be more easily produced that 
  we'd expected, and we consider it to be a severe problem.

Hmm, OK.  Then I will do my best to make sure we get a fix for this
into 2.6.22.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-22 Thread Roland Dreier
  1. Is there something special you do when you run the benchmark (msi, 
  taskset, ...)?

Yes, I am using MSI-X, and I pin the interrupt handler to one CPU
(CPU#0 in my particular case).  Then I use taskset to pin the NPtcp
process to a CPU in a different package (CPU#2 in my system).

BTW with these same systems, I am getting up to ~1150 MB/sec of
throughput with DDR mem-free Arbel, as measured with NPtcp.

  2. On a wild guess that the issue here is higher interrupt rate with CM,
 is there a chance you could test the following patch posted by me earlier?
 http://www.mail-archive.com/openib-general@openib.org/msg29290.html

OK, I'll try that when I get a chance.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-22 Thread Roland Dreier
OK, I applied the following patch (I had to change one line of your
patch to get it to apply because the small-message changed the context
so one chunk didn't apply).

Anyway I don't see any difference in small message latency or large
message throughput.  (Actually latency seems slightly worse but I
think the change is within my normal variability so I'm don't think
the difference is significant)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 2594db2..20d7ad4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -98,9 +98,9 @@ enum {
 
 #defineIPOIB_OP_RECV   (1ul  31)
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
-#defineIPOIB_CM_OP_SRQ (1ul  30)
+#defineIPOIB_OP_CM (1ul  30)
 #else
-#defineIPOIB_CM_OP_SRQ (0)
+#defineIPOIB_OP_CM (0)
 #endif
 
 /* structs */
@@ -143,7 +143,6 @@ struct ipoib_cm_rx {
 
 struct ipoib_cm_tx {
struct ib_cm_id *id;
-   struct ib_cq*cq;
struct ib_qp*qp;
struct list_head list;
struct net_device   *dev;
@@ -232,6 +231,7 @@ struct ipoib_dev_priv {
unsigned tx_tail;
struct ib_sgetx_sge;
struct ib_send_wrtx_wr;
+   unsigned tx_outstanding;
 
struct ib_wc ibwc[IPOIB_NUM_WC];
 
@@ -438,6 +438,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx);
 void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
   unsigned int mtu);
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc);
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc);
 #else
 
 struct ipoib_cm_tx;
@@ -526,6 +527,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device 
*dev, struct ib_wc *w
 {
 }
 
+static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc 
*wc)
+{
+}
 #endif
 
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 3484e8b..9515ef6 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -82,7 +82,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int 
id)
struct ib_recv_wr *bad_wr;
int i, ret;
 
-   priv-cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+   priv-cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV;
 
for (i = 0; i  IPOIB_CM_RX_SG; ++i)
priv-cm.rx_sge[i].addr = priv-cm.srq_ring[id].mapping[i];
@@ -344,7 +344,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int 
hdr_space,
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   unsigned int wr_id = wc-wr_id  ~IPOIB_CM_OP_SRQ;
+   unsigned int wr_id = wc-wr_id  ~(IPOIB_OP_CM | IPOIB_OP_RECV);
struct sk_buff *skb, *newskb;
struct ipoib_cm_rx *p;
unsigned long flags;
@@ -436,7 +436,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
priv-tx_sge.addr = addr;
priv-tx_sge.length   = len;
 
-   priv-tx_wr.wr_id = wr_id;
+   priv-tx_wr.wr_id = wr_id | IPOIB_OP_CM;
 
return ib_post_send(tx-qp, priv-tx_wr, bad_wr);
 }
@@ -487,20 +487,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff 
*skb, struct ipoib_cm_
dev-trans_start = jiffies;
++tx-tx_head;
 
-   if (tx-tx_head - tx-tx_tail == ipoib_sendq_size) {
+   if (++priv-tx_outstanding == ipoib_sendq_size) {
ipoib_dbg(priv, TX ring 0x%x full, stopping kernel net 
queue\n,
  tx-qp-qp_num);
netif_stop_queue(dev);
-   set_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags);
}
}
 }
 
-static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx 
*tx,
- struct ib_wc *wc)
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   unsigned int wr_id = wc-wr_id;
+   struct ipoib_cm_tx *tx = wc-qp-qp_context;
+   unsigned int wr_id = wc-wr_id  ~IPOIB_OP_CM;
struct ipoib_tx_buf *tx_req;
unsigned long flags;
 
@@ -525,11 +524,10 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, 
struct ipoib_cm_tx *tx
 
spin_lock_irqsave(priv-tx_lock, flags);
++tx-tx_tail;
-   if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags)) 
-   tx-tx_head - tx-tx_tail = ipoib_sendq_size  1) {
-   clear_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags);
+   if (unlikely(--priv-tx_outstanding == ipoib_sendq_size  1) 
+   netif_queue_stopped(dev) 
+   test_bit(IPOIB_FLAG_ADMIN_UP, priv-flags))

Re: [openib-general] IPOIB NAPI

2007-02-22 Thread Roland Dreier
  By the way, how about extending the userspace API in a similiar
  fashion?
  
  missed_events = ibv_req_notify_cq(priv-cq, IBV_CQ_NEXT_COMP |
 IBV_CQ_REPORT_MISSED_EVENTS)

It would require a kernel-user ABI bump.  Is it worth it?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland

2007-02-22 Thread Roland Dreier
These all look fine, I'll queue them up.

 Signed-off-by: Sean Hefty [EMAIL PROTECTED]

I notice that the actual patches you committed don't have your
sign-off in the git changelog.  I assume this is a mistake so I'll add
it back in...

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland

2007-02-22 Thread Roland Dreier
  I notice that the actual patches you committed don't have your
  sign-off in the git changelog.  I assume this is a mistake so I'll add
  it back in...

which means I can't just pull your branch.  But that's OK, still doing
git format-patch, edit patches, git am is pretty easy.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland

2007-02-22 Thread Roland Dreier
  The patches are in git.openfabrics.org/~shefty/rdma-dev.git,
  for-roland branch, which is based on 2.6.21-rc1.

One other request: please include a URL that I can just copy and
paste, so I don't actually have to read and parse complete sentences.
Something like:

the patches are in

git://git.openfabrics.org/~shefty/rdma-dev.git for-roland

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland

2007-02-22 Thread Roland Dreier
Anyway, all 4 queued up in my for-2.6.21 branch

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter

2007-02-22 Thread Roland Dreier
  GCC seems to be unable to propogate constants across calls to htonl.
  So it turns out to be worth the while to replace htonl with
  a hand-written macro in case of constant parameter.

I'm wondering why this helps you.  On my system (which has Debian's
old glibc 2.3.6, certainly nothing particularly fancy), I see in
my netinet/in.h:

/* Get machine dependent optimized versions of byte swapping functions. 
 */
#include bits/byteswap.h

#ifdef __OPTIMIZE__
/* We can optimize calls to the conversion functions.  Either nothing 
has
   to be done or we are using directly the byte-swapping functions which
   often can be inlined.  */
# if __BYTE_ORDER == __BIG_ENDIAN
//...
# else
#  if __BYTE_ORDER == __LITTLE_ENDIAN
#   define ntohl(x) __bswap_32 (x)

and so on (and gcc defines __OPTIMIZE__ if you pass it any -O flag
including -Os).  And in bits/byteswap.h I have

/* Swap bytes in 32 bit value.  */
#define __bswap_constant_32(x) \
 x)  0xff00)  24) | (((x)  0x00ff)   8) | 
  \
  (((x)  0xff00)   8) | (((x)  0x00ff)  24))

and variations of __bswap_32() that look roughly like

#  define __bswap_32(x) \
 (__extension__ 
  \
  ({ register unsigned int __v, __x = (x);  
  \
 if (__builtin_constant_p (__x))
  \
   __v = __bswap_constant_32 (__x); 
  \
 else   
  \

and so on.  (The point of all this being that for constants, htonl()
should expand to roughly the same thing as your CONSTANT_HTONL() --
the only difference is that you don't have the  for the  24 and 
24 parts, which I guess just has the potential to bite us if someone
did something like CONSTANT_HTONL(1L) on a 64-bit system).

As a quick test I compiled the code

#include netinet/in.h

enum {
Y = 5
};

uint32_t foo(uint32_t x)
{
return x | htonl(Y);
}

with gcc -c -O and the disassembly of foo() looks like

 foo:
   0:   89 f8   mov%edi,%eax
   2:   0d 00 00 00 05  or $0x500,%eax
   7:   c3  retq   

and so everything works exactly the way we would want.  (32-bit i386
also just does or with a constant too).

In fact for libmthca I just checked that the preprocessor output of
places like the following (which your patch converts)

((wr-send_flags  IBV_SEND_SIGNALED) ?
 htonl(MTHCA_NEXT_CQ_UPDATE) : 0) |

is

   ((wr-send_flags  IBV_SEND_SIGNALED) ?
(__extension__ ({ register unsigned int __v, __x = (MTHCA_NEXT_CQ_UPDATE); 
if (__builtin_constant_p (__x)) __v = __x)  0xff00)  24) | (((__x)  
0x00ff)  8) | (((__x)  0xff00)  8) | (((__x)  0x00ff)  
24)); else __asm__ (bswap %0 : =r (__v) : 0 (__x)); __v; })) : 0) |

And if I compare the generated assembly for libmthca with and without
your patch (on both x86-64 and i386), I don't see any significant
difference (the size is exactly the same, I just see things like the
compiler using eax and edx in the opposite order and trivial things
like that).

So what is different in your setup that causes this patch to make a
difference for you?

(BTW, one thing I did notice while looking at the i386 assembly is
that one micro-optimization that might make sense to use something
like __attribute__((regparm(3))) for internal function calls within
libibverbs and libmthca on i386, since otherwise we waste instructions
pushing stuff on the stack for no reason other than compliance with
the crufty old i386 ABI.  Something like a FASTCALL macro in
infiniband/arch.h perhaps... if anyone really cares about 32-bit
i386 performance any more)

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] I created a git tree for the libibverbs man pages

2007-02-21 Thread Roland Dreier
   What is the Max # of cards OFED driver/library can support on a
  single node ?  

The lowest limit I know of is the # of device minors available for
/dev/infiniband/uverbs files, which is 32.  How many devices are you
interested in supporting?

This limit could probably be increased without too much trouble, but I
doubt any realistic system will run into it anyway.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Fork issues with simple MPI program

2007-02-20 Thread Roland Dreier
  As replied before - if you want full fork support you need to change the 
  application. Look at the verbs header for details.

Or you could try setting the IBV_FORK_SAFE environment variable before
running your application.  I guess for MPI jobs you need to make sure
that environment variable is propagated to every process.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Fork issues with simple MPI program

2007-02-20 Thread Roland Dreier
  If you can send me the details (since you implemented it) I will add
  it to the Wiki

An application that wants fork() to work with libibverbs should
either call ibv_fork_init() before doing anything else with
libibverbs, or else a user can set the IBV_FORK_SAFE or
RDMAV_FORK_SAFE environment variable to get the same effect.  There is
some overhead to making fork() work so it is not enabled by default.
This is described in the ibv_fork_init manpage in the latest
libibverbs git tree.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Fork issues with simple MPI program

2007-02-20 Thread Roland Dreier
  Does this require 2.6.16 or better kernel support?

The kernel must support the MADV_DONTFORK flag to madvise(), not sure
when exactly that was merged but 2.6.16 or so sounds right.

ibv_fork_init() will return an error if the kernel support is missing
and fork safety won't actually work.  And if you use the environment
variable a warning will be printed if ibv_fork_init() fails.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] I created a git tree for the libibverbs man pages

2007-02-20 Thread Roland Dreier
I merged all these manpages into my libibverbs tree and pushed the
result out to kernel.org.

Please send any future updates as diffs against the libibverbs tree.

Thanks,
  Roland

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth

2007-02-20 Thread Roland Dreier
Thanks, queued for 2.6.21.  With this patch I see small-packet latency
down almost all the way back to what datagram mode gives -- on a pair
of fast woodcrest systems I see latencies for netpipe tcp 1 byte
messages like

  datagram 13.xx
  original CM  17.xx
  patched CM   14.xx

so there is still a measurable difference but it is much less now.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: possible cleanups

2007-02-20 Thread Roland Dreier
  You could just remove the code instead of #if 0...

Steve, can you decide what the right thing to do with these changes is
and send me the result (or just tell me to apply Adrian's patch
as-is)?

Thanks,
  Roland

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [2.6 patch] infiniband/hw/mthca/mthca_mr.c: make 2 functions static

2007-02-19 Thread Roland Dreier
Queued for my next merge, thanks.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path()

2007-02-18 Thread Roland Dreier
  In issue number 296 that i opened several months ago in the Bugzilla, i
  reported about two missing attributes: the first one is the static_rate,
  and the second one is the src_path_bits which is not being filled right.

The patch I posted fixes the static rate, right?

You'll need to explain what you mean about src_path_bits, because at
first glance the code looks OK to me.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow

2007-02-16 Thread Roland Dreier
  We have a customer issue regarding IPv6oIB. In the subnet, there are
  limited number of MCGs supported. So when there are multiple IPv6 addresses
  are assigned to one interface, each IPv6 address will have one unique
  solicited-node address (depends on their groupID). Then in a large subnet,
  we will have tons of MCGs. If IPv6 solicited node addresses exceed the
  number of MDGs in this subnet, then IPv6 neighbour discovery will be
  broken, this won't happen in Ethernet since sendonly doesn't require sender
  to be joined any MCG.

  I have done an initial patch to addresss MCG overflow problem and redirect
  the solicited-node address to all hosts node address, thus IPv6 neighbour
  discovery will work no matter how many IPv6 addresses in this subnet. This
  patch is only triggered with IPv6 enabled and MGC overflows, so there is
  almost no performance penalty.

I really don't like this approach, since it can break things in very
subtle ways (eg suppose one node fails to join its solicited node
group, but then a later node wants to talk to it and succeeds in
joining the solicited node group as a send-only member -- since the
first node is not a member then it will never see the ND messages).

I much prefer to fix the SM not to impose too-low limits on the number
of MCGs.  Supporting O(# nodes) MCGs is really not a very onerous
requirement on the SM.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH for-2.6.21] IB/ipoib: error handling thinko fix

2007-02-16 Thread Roland Dreier
Thanks, queued for 2.6.21.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow

2007-02-16 Thread Roland Dreier
  Even SM supports 1000 MCGs, it's still not sufficitent for 250 nodes
  cluster, each node have 4 links for IPv6 without any scope/global IPv6
  address configured.(250*4+ a few default MCGs) There will be a MCG overflow
  problem anyway in IPv6oIB.

But what's the problem with supporting 1000 or even 1 MCGs?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow

2007-02-16 Thread Roland Dreier
   I much prefer to fix the SM not to impose too-low limits on the number
   of MCGs.  Supporting O(# nodes) MCGs is really not a very onerous
   requirement on the SM.
  
  Is this a MFT size issue or SM issue or both ?

Well as we discussed before, the size of the MFT is really independent
of the # of MCGs supported.  It's up to the SM how to allocate MLIDs,
and as long as all the switches in the fabric support at least one
MLID, then any number of MCGs can be managed by the SM.  So I would
say this is entirely an SM issue.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] SA multicast patches

2007-02-16 Thread Roland Dreier
  The pkey is the default partition, full membership pkey.  I believe
  all nodes will have either 0x or 0x7fff as their pkey.  We could
  probably call ib_get_cached_pkey() instead and just use the first
  entry in the table.

Well the consumer has to know what P_Key to use since it must match
the QP that will be used to send/receive.  So I would suggest not
trying to guess in the low-level multicast.c code, and rely on the
consumer to set it properly.

  We don't want to to set the privileged bit of the q_key, so that's
  wrong.  Good catch.

OK, I'll replace the code with something like random32()  0x7fff

One other question about the PS_IPOIB stuff:

  +static int cma_set_qkey(struct ib_device *device, u8 port_num,
  +enum rdma_port_space ps,
  +struct rdma_dev_addr *dev_addr, u32 *qkey)
  +{
  +struct ib_sa_mcmember_rec rec;
  +int ret = 0;
  +
  +switch (ps) {
  +case RDMA_PS_UDP:
  +*qkey = RDMA_UDP_QKEY;
  +break;
  +case RDMA_PS_IPOIB:
  +ib_addr_get_mgid(dev_addr, rec.mgid);
  +ret = ib_sa_get_mcmember_rec(device, port_num, rec.mgid, rec);
  +*qkey = be32_to_cpu(rec.qkey);
  +break;

Does this work if userspace tries to join a new IPoIB MCG that the
kernel driver hasn't joined yet?  From reading the code it seems that
ib_sa_get_mcmember_rec() would fail with -EADDRNOTAVAIL and so the
whole join request would fail.

Am I reading this correctly?  Is it supposed to work?  I would think
that it would be nice to be able to receive on IPoIB MCGs not also
being received by the kernel.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow

2007-02-16 Thread Roland Dreier
  I thought that mapping multiple MCGs to the same MLID requires that a
  set of the (group) parameters are the same. Is that the case for these
  IPv6 groups ? Is the only variable in those parameters the PKey ?

I don't see why any group parameters need to be the same -- I'm
probably missing something, but which parameters in particular did you
have in mind?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow

2007-02-16 Thread Roland Dreier
  For the successful join, ND sends to the node directly, for the failure
  join, ND sends to all hosts addr. So ND will work no matter whether the
  join OK or not, that's the patch does.

But what if the full-member join fails on node A for node A's
solicited node group, but then node B succeeds in joining that group
as a send-only member (perhaps because some other nodes have dropped
off the fabric in the meantime).  Then node B will send the ND message
on a MCG that A is not a member of.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] SA multicast patches

2007-02-16 Thread Roland Dreier
OK, another question about the multicast.c code:

  +static struct mcast_group *mcast_find(struct mcast_port *port,
  +  union ib_gid *mgid)
  +{
  +struct rb_node *node = port-table.rb_node;
  +struct mcast_group *group;
  +int ret;
  +
  +while (node) {
  +group = rb_entry(node, struct mcast_group, node);
  +ret = memcmp(mgid-raw, group-rec.mgid.raw, sizeof *mgid);
  +if (!ret)
  +return group;
  +
  +if (ret  0)
  +node = node-rb_left;
  +else
  +node = node-rb_right;
  +}
  +return NULL;
  +}
  +
  +static struct mcast_group *mcast_insert(struct mcast_port *port,
  +struct mcast_group *group,
  +int allow_duplicates)
  +{
  +struct rb_node **link = port-table.rb_node;
  +struct rb_node *parent = NULL;
  +struct mcast_group *cur_group;
  +int ret;
  +
  +while (*link) {
  +parent = *link;
  +cur_group = rb_entry(parent, struct mcast_group, node);
  +
  +ret = memcmp(group-rec.mgid.raw, cur_group-rec.mgid.raw,
  + sizeof group-rec.mgid);
  +if (ret  0)
  +link = (*link)-rb_left;
  +else if (ret  0)
  +link = (*link)-rb_right;
  +else if (allow_duplicates)
  +link = (*link)-rb_left;
  +else
  +return cur_group;
  +}
  +rb_link_node(group-node, parent, link);
  +rb_insert_color(group-node, port-table);
  +return NULL;
  +}

How does it work to put duplicates into the RB tree?  It seems
especially strange that the lookup code does:

  +if (ret  0)
  +node = node-rb_left;
  +else
  +node = node-rb_right;

so if ret == 0 (ie the two GIDs being tested are the same) then it
continues to traverse to the right, while the insert code does:

  +else if (allow_duplicates)
  +link = (*link)-rb_left;

which seems to put duplicates to the left always.

Also I'd be really worried that the rebalancing code freaks out when
duplicate keys are inserted in the tree.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow

2007-02-16 Thread Roland Dreier
  For starters, I think that rate, MTU, and SL (and maybe PKey too) need
  to be the same. There may be others too if I stare at the spec for a
  while...

Can you expand on why?  For example I definitely can send to the same
MLID with different SLs.  Of course MTU and rate need to match up but
I don't see that as a real restriction -- the SM needs to allows for
least-common-denominator values anyway, since the least-capable node
on the fabric might join an existing group.

I don't see why one MCG with an MTU of 2048 and one MCG with an MTU of
1024 can't share the same MLID, as long as the underlying fabric is
capable of supporting an MTU of 2048.  Actually, I wonder what the
spec says about what switches should do if they're asked to forward
packets with too-big MTUs?  Maybe it all works out anyway.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow

2007-02-16 Thread Roland Dreier
   But what if the full-member join fails on node A for node A's
   solicited node group, but then node B succeeds in joining that group
   as a send-only member (perhaps because some other nodes have dropped
   off the fabric in the meantime).  Then node B will send the ND message
   on a MCG that A is not a member of.

  Yes. B can send ND to A, and A responds without being a member so IPv6 ND
  works. Is there any security or other problems here?

Node A is not a member of the group B is sending on, so SM does not
have to set up any routes for the messages to even reach node A.  So
it doesn't see the messages and doesn't respond to ND.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow

2007-02-16 Thread Roland Dreier
  Sure but I think this complicates the SL2VL tables in the subnet to
  accomodate this. I think a similar thing is true for PKeys. So to me
  this is an SM complexity issue when mapping multiple MGRPs to same MLID.

I'm still confused.  Aren't SL2VL and P_Key tables completely
orthogonal from forwarding tables?  Obviously there's no problem using
multiple different SLs or P_Keys to reach the same endport using the
same LID, so I don't understand why MLIDs would be different.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] SA multicast patches

2007-02-16 Thread Roland Dreier
  All multicast groups need to be tracked, which is why even groups with
  MGID 0 are inserted into the tree.

OK...

  Immediately above this code, the group is returned if ret == 0.

Right, I missed that.  But...

  Calling mcast_find() for MGID 0 isn't useful, so the code avoids doing
  this, but I think that it would work.  The caller would just get an
  arbitrary group.

Now this is confusing -- you say the code avoids looking up MGID 0 in
the rbtree.  So why do you have to insert those groups in the tree and
have the allow_duplicates() flag etc?  If you're never going to look
up the group, I assume you have some other way of finding it and so
you don't actually have to insert MGID 0 groups after all... right?

Or is it that you want to be able to iterate through the whole rbtree
and get the MGID 0 groups too?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow

2007-02-16 Thread Roland Dreier
  Two MCGs groups must be establised before IPoIB link up, one is broadcast
  for IPv4, one is all hosts multicast for IPv6. So Node A is a member of all
  hosts address, the patch directs ND sends to all hosts, so node A responses
  it.

I'm still confused.  How do you interoperate with other RFC-compliant
nodes (they might not have your patch or might not even be running
Linux) that send ND messages to the solicited node group?  If node A
has your patch and doesn't try to join its own solicited node group,
then another node that doesn't know to send ND messages to the all
nodes group will not be able to find it.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-16 Thread Roland Dreier
OK, I pulled this in to my for-2.6.21 branch and I will ask Linus to
pull later today.

Thanks.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path()

2007-02-16 Thread Roland Dreier
Guys, any reason not to merge this?  It's step one of the cleanups
from Jason's patch to make IPoIB work with global routes...

The static rate from the path record should be put into the address
vector -- a long time ago the rate in the address attributes needed to
be a relative rate, which required more munging, but now that the
conversion from absolute to relative is done in the low-level driver,
it's easy for ib_init_ah_from_path() to put the absolute rate in.

Cc: Jason Gunthorpe [EMAIL PROTECTED]
Cc: Sean Hefty [EMAIL PROTECTED]
Signed-off-by: Roland Dreier [EMAIL PROTECTED]
---
 drivers/infiniband/core/sa_query.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index d7d4a53..68db633 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -471,6 +471,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 
port_num,
ah_attr-sl = rec-sl;
ah_attr-src_path_bits = be16_to_cpu(rec-slid)  0x7f;
ah_attr-port_num = port_num;
+   ah_attr-static_rate = rec-rate;
 
if (rec-hop_limit  1) {
ah_attr-ah_flags = IB_AH_GRH;
-- 
1.4.4.4

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [GIT PULL] please pull infiniband.git

2007-02-16 Thread Roland Dreier
Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

This adds IB multicast tracking, to allow userspace to use multicast
groups in a sane way, an ehca interrupt handling fixup, and a few
other minor things.  I don't think there is anything major left, so we
should be good for 2.6.21-rc1 after this pull.

Dotan Barak (1):
  IB/mthca: Allow the QP state transition RESET-RESET

Hoang-Nam Nguyen (4):
  IB/ehca: Rework irq handler
  IB/ehca: Fix race condition/locking issues in scaling code
  IB/ehca: Allow en/disabling scaling code via module parameter
  IB/ehca: Change query_port() to return LINK_UP instead UNKNOWN

Michael S. Tsirkin (1):
  IPoIB: CM error handling thinko fix

Roland Dreier (5):
  IB/mthca: Fix allocation of ICM chunks in coherent memory
  IPoIB: Only allow root to change between datagram and connected mode
  IB/core: Fix sparse warnings about shadowed declarations
  IB/ipath: Make ipath_map_sg() static
  IB/core: Set static rate in ib_init_ah_from_path()

Sean Hefty (2):
  IB/sa: Track multicast join/leave requests
  RDMA/cma: Add multicast communication support

Steve Wise (3):
  RDMA/iwcm: iw_cm_id destruction race fixes
  RDMA/cxgb3: Fail posts synchronously when in TERMINATE state
  RDMA/cxgb3: Remove Open Grid Computing copyrights in iw_cxgb3 driver

 drivers/infiniband/core/Makefile   |2 +-
 drivers/infiniband/core/cma.c  |  359 +--
 drivers/infiniband/core/fmr_pool.c |4 +-
 drivers/infiniband/core/iwcm.c |   47 +-
 drivers/infiniband/core/multicast.c|  837 
 drivers/infiniband/core/sa.h   |   66 ++
 drivers/infiniband/core/sa_query.c |   30 +-
 drivers/infiniband/core/sysfs.c|2 -
 drivers/infiniband/core/ucma.c |  204 ++-
 drivers/infiniband/hw/cxgb3/cxio_dbg.c |1 -
 drivers/infiniband/hw/cxgb3/cxio_hal.c |1 -
 drivers/infiniband/hw/cxgb3/cxio_hal.h |1 -
 drivers/infiniband/hw/cxgb3/cxio_resource.c|1 -
 drivers/infiniband/hw/cxgb3/cxio_resource.h|1 -
 drivers/infiniband/hw/cxgb3/cxio_wr.h  |1 -
 drivers/infiniband/hw/cxgb3/iwch.c |1 -
 drivers/infiniband/hw/cxgb3/iwch.h |1 -
 drivers/infiniband/hw/cxgb3/iwch_cm.c  |1 -
 drivers/infiniband/hw/cxgb3/iwch_cm.h  |1 -
 drivers/infiniband/hw/cxgb3/iwch_cq.c  |1 -
 drivers/infiniband/hw/cxgb3/iwch_ev.c  |1 -
 drivers/infiniband/hw/cxgb3/iwch_mem.c |1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.c|1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.h|1 -
 drivers/infiniband/hw/cxgb3/iwch_qp.c  |3 +-
 drivers/infiniband/hw/cxgb3/iwch_user.h|1 -
 drivers/infiniband/hw/ehca/Kconfig |8 -
 drivers/infiniband/hw/ehca/ehca_classes.h  |   19 +-
 drivers/infiniband/hw/ehca/ehca_eq.c   |1 +
 drivers/infiniband/hw/ehca/ehca_hca.c  |3 +
 drivers/infiniband/hw/ehca/ehca_irq.c  |  307 +
 drivers/infiniband/hw/ehca/ehca_irq.h  |1 +
 drivers/infiniband/hw/ehca/ehca_main.c |   32 +-
 drivers/infiniband/hw/ehca/ipz_pt_fn.h |   11 +-
 drivers/infiniband/hw/ipath/ipath_dma.c|4 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c|4 +-
 drivers/infiniband/hw/mthca/mthca_qp.c |5 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|4 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  195 ++
 include/rdma/ib_addr.h |6 +
 include/rdma/ib_sa.h   |  159 ++---
 include/rdma/rdma_cm.h |   21 +-
 include/rdma/rdma_cm_ib.h  |4 +-
 include/rdma/rdma_user_cm.h|   13 +-
 44 files changed, 1889 insertions(+), 478 deletions(-)
 create mode 100644 drivers/infiniband/core/multicast.c
 create mode 100644 drivers/infiniband/core/sa.h

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 32-bit build for ppc64 is required

2007-02-15 Thread Roland Dreier
  Usually this should work, but I don't rely on that since we also support
  s390/s390x (although not with Infiniband, but the OpenMPI alternative
  that we shipped with RHEL4, lam, gets compiled on s390/s390x) and that
  pair is a bit of an odd mix and I don't have one setting here at my
  house where I work, so it's hard for me to confirm that just leaving
  things to happen by default works as anticipated.  If they would ever
  make an s390 that uses less than a gigawatt of power and heats less than
  a large sized convention center, that could change... ;-)

http://www.conmicro.cx/hercules/

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2.6.21-rc1 5/5] ehca: query_port() returns LINK_UP instead UNKNOWN

2007-02-15 Thread Roland Dreier
Thanks, queued 1, 2, 3 and 5 for 2.6.21.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] How heavy to resize a CQ ?

2007-02-15 Thread Roland Dreier
In dynamic process application, we don't know how many
  connections a process will make when we create the CQ, so we don't know
  the CQ size, what we do is to increase the CQ size when a new connection
  is made, and decrease the CQ size when a connection is destroyed. My
  question is, is ibv_resize_cq() a lightweight function call ?  Do we
  have to drain the CQ before we resize the CQ ?

I would say that resizing a CQ is not lightweight -- I've never
benchmarked it but it's probably comparable to creating a CQ or
something like that.  There is no requirement to drain the CQ or
anything like that before resizing it -- you can resize it any time,
even if it is currently getting completions or being polled.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion()

2007-02-15 Thread Roland Dreier
Looking at this one more time, I think it actually may be buggy:

  @@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d
   spin_lock_init(my_cq-spinlock);
   spin_lock_init(my_cq-cb_lock);
   spin_lock_init(my_cq-task_lock);
  +init_completion(my_cq-zero_callbacks);

So you initialize the zero_callbacks completion once, at
ehca_create_cq().

But then 

  @@ -612,11 +613,14 @@ static void run_comp_task(struct ehca_cp
   
   spin_lock(cq-task_lock);
   cq-nr_callbacks--;
  -if (cq-nr_callbacks == 0) {
  +is_complete = (cq-nr_callbacks == 0);
  +if (is_complete) {
   list_del_init(cct-cq_list.next);
   cct-cq_jobs--;
   }
   spin_unlock(cq-task_lock);
  +if (is_complete) /* wake up waiting destroy_cq() */
  +complete(cq-zero_callbacks);
   }

every time nr_callbacks drops to 0, you complete the zero_callbacks
completion.  So the first time a callback runs, you will complete
zero_callbacks, which will let wait_for_completion() finish even if
you later increment nr_callbacks again.

Also this

  -while (my_cq-nr_callbacks) {
  +if (my_cq-nr_callbacks) {
   spin_unlock_irqrestore(ehca_cq_idr_lock, flags);
  -yield();
  +wait_for_completion(my_cq-zero_callbacks);
   spin_lock_irqsave(ehca_cq_idr_lock, flags);
   }

looks rather unsafe -- I don't see any common locking protecting both
this test of nr_callbacks and the setting of nr_callbacks in the ehca
irq handling... so I don't see anything protecting you from seeing
nr_callbacks==0 and not going into the if() (or while() -- the old
code has the same problem I think) but then doing ++nr_callbacks
somewhere else.  In fact since you do the idr_remove() and
hipz_h_destroy_cq() *after* you make sure no callbacks are running,
this seems like it could happen easily.

So I'm holding off on applying this for now.  Please think it over and
either tell me the current patch is OK, or fix it up.  There's not
really too much urgency because a change like this is something I
would be comfortable merging between 2.6.21-rc1 and -rc2.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] remap_page_range() in older kernels

2007-02-15 Thread Roland Dreier
  Do you remember any issues with using remap_page_range() in older
  kernels for mapping memory allocated in the kernel back to a user
  process?  

No, I would have thought it should work just like remap_pfn_range() in
later kernels.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] SA multicast patches

2007-02-15 Thread Roland Dreier
So I'm reading this over, and the following code looks kind of odd to me:

  +int ib_sa_get_mcmember_rec(struct ib_device *device, u8 port_num,
  +   union ib_gid *mgid, struct ib_sa_mcmember_rec *rec)
  
  ...
  
  +} else {
  +memset(rec, 0, sizeof *rec);
  +ib_get_cached_gid(device, port_num, 0, rec-port_gid);
  +rec-pkey = 0x;
  +get_random_bytes(rec-qkey, sizeof rec-qkey);
  +rec-join_state = 1;
  +}

Where is this particular hard-coded P_Key value coming from?  And how
about the Q_Key -- why is a random one being chosen?  Does it matter
that this is setting the privileged bit of the Q_Key at random?

The only place this code seems to be used is in
cma_join_ib_multicast(), which overwrites all the values that get set
here anyway.  (Except it leaves the Q_Key if the portspace is not UDP??)
Would it be more sensible to leave the P_Key and Q_Key initialized to
0 here, and let the caller handle it?  I don't see how the multicast
tracking module can pick a sensible default here.

Also, should we check the return value of ib_get_cached_gid()?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.21 iwcm - iw_cm_id destruction race condition fixes.

2007-02-15 Thread Roland Dreier
thanks, applied

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.21 iw_cxgb3 Fail posts synchronously when in TERMINATE state.

2007-02-15 Thread Roland Dreier
thanks, applied.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] iw_cxgb3 Fix copyrights in the iw_cxgb3 driver.

2007-02-15 Thread Roland Dreier
thanks, applied

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-14 Thread Roland Dreier
  How do you mean, again? Does sg_set_buf set dma_length?

No, you're right, sorry.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-14 Thread Roland Dreier
   I don't see anything that ever bumps chunk-nsg if we're allocating a
   coherent region and we end up needing more than one allocation to do
   it.
  
  Yes but this is intentional.

  No, I think the code is fine and this patch will break things:
  chunk-nsg is needed only for non-coherent memory to call pci_unmap_sg:

what about this code in mthca_memfree.h?

static inline void mthca_icm_next(struct mthca_icm_iter *iter)
{
if (++iter-page_idx = iter-chunk-nsg) {

the call to pci_unmap_sg you're worried about is in
mthca_free_icm_pages(), which can't be called for coherent memory
anyway, so I don't see a problem with that.

So I think my patch is correct and needed.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events

2007-02-14 Thread Roland Dreier
Looks fine but this patch at least has serious whitespace
damage... please resend a fixed version.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] IPoIB: Only allow root to change between datagram and connected mode

2007-02-14 Thread Roland Dreier
Change the permissions of the mode sysfs attribute to be S_IWUSR
instead of S_IWUGO.

Signed-off-by: Roland Dreier [EMAIL PROTECTED]
---
FYI -- I'm planning to merge this for 2.6.21.  It doesn't seem
appropriate to allow ordinary users to mess with this sort of config.

 drivers/infiniband/ulp/ipoib/ipoib_cm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 2d48387..8881a71 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1138,7 +1138,7 @@ static ssize_t set_mode(struct device *d, struct 
device_attribute *attr,
return -EINVAL;
 }
 
-static DEVICE_ATTR(mode, S_IWUGO | S_IRUGO, show_mode, set_mode);
+static DEVICE_ATTR(mode, S_IWUSR | S_IRUGO, show_mode, set_mode);
 
 int ipoib_cm_add_mode_attr(struct net_device *dev)
 {
-- 
1.4.4.4

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion()

2007-02-14 Thread Roland Dreier
I agree with Christoph -- the use of wait_for_completion() in a loop
makes no sense.  When you send a new copy of this patch without
whitespace damage, please fix that up too...

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] mvapich2 ofed 1.2 problem

2007-02-13 Thread Roland Dreier
  Does this stack indicate that libibverbs is accessing a 1.0 provider?
  cxgb3 shouldn't be 1.0 right?

  #1  0x2b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830)
  at src/compat-1_0.c:572
  #2  0x2b832cfef04e in rdma_cm_init_pd_cq ()
 from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so

This means that the app (or maybe the RDMA CM library?) is linked
against the 1.0 API -- which should work even with cxgb3 actually.
But maybe mvapich is built against the 1.1 API and the RDMA CM is
built against 1.0 or something?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] mvapich2 ofed 1.2 problem

2007-02-13 Thread Roland Dreier
  How do I tell?  Can I tell from the .so files?

ldd on the .so and the app would probably give you good info.

I'm pretty sure that mpicc must be linking against an libibverbs 1.0
from somewhere.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] mvapich2 ofed 1.2 problem

2007-02-13 Thread Roland Dreier
  When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is
  built, at least by looking at the .so file result:
  
  [EMAIL PROTECTED] ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a
  libibverbs.so
  libibverbs.so.1
  libibverbs.so.1.0.0

The soname hasn't changed because the library is still compatible.
But (I hope at least) OFED has libibverbs 1.1.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [GIT PULL] please pull infiniband.git

2007-02-13 Thread Roland Dreier
  What about the patch that i sent on Allow the following QP state
  transition : reset -- reset?

OK, I'll merge that in the next patch.  It's the kind of patch I'm not
happy about merging, since it bloats the code to handle a corner case
no one is likely to hit in practice, but it is technically correct so
I guess we're forced to merge it.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 4 of 4] IB/mthca: give reserved MTTs a separate cache line

2007-02-12 Thread Roland Dreier
Thanks, applied as 2 separate patches.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace.

2007-02-12 Thread Roland Dreier
Looks mostly sane (assuming it works on 32-bit userspace on 64-bit
kernel now), but:

  -context = kmalloc(sizeof(*context), GFP_KERNEL);
  +context = kzalloc(sizeof(*context), GFP_KERNEL);

Why do you need this?  Is this an unrelated change?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace.

2007-02-12 Thread Roland Dreier
  Because the key generator u32 is in the context now, and the kzalloc()
  initializes it.  I could have done:  
  
  context-key = 0;
  
  But km - kz was less typing. ;-)

OK, got it.  Anyway as I said, from a quick read the changes look
sane, with the assumption that they work.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace.

2007-02-12 Thread Roland Dreier
Steve I tested and it works.  Do you want to pull this in before
Steve you push the driver upstream?  Do I need to repost it?

I'll grab it and merge it in.  I expect to ask Linus to pull later
today.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace.

2007-02-12 Thread Roland Dreier
Actually, that patch doesn't apply because of the %llx warning fixes
I pushed out.  And git-apply also complains about trailing
whitespace.  Can you resend a version that applies to the my
for-2.6.21 branch?

Thanks

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-12 Thread Roland Dreier
  +sg_set_buf(mem, buf, PAGE_SIZE  order);
  +BUG_ON(mem-offset);
  +sg_dma_len(mem) = PAGE_SIZE  order;

What am I missing?  Any reason to set sg_dma_len() again after sg_set_buf()?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-12 Thread Roland Dreier
Queued for 2.6.21, although I think a further cleanup would be:

   mdev-mr_table.mpt_table = mthca_alloc_icm_table(mdev, 
  init_hca-mpt_base,
dev_lim-mpt_entry_sz,
mdev-limits.num_mpts,
  - 
  mdev-limits.reserved_mrws, 1);
  + 
  mdev-limits.reserved_mrws,
  + 1, 1);

instead of having use_lowmem and use_coherent be separate parameters,
we should probably convert it to a type parameter, and have
MTHCA_ICM_TABLE_HIGHMEM, _LOWMEM and _COHERENT.  That would make these
calls a lot easier to read and get correct.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [GIT PULL] please pull infiniband.git

2007-02-12 Thread Roland Dreier
Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

This will add the new cxgb3 RDMA driver for Chelsio T3 NICs, as well
as IPoIB connected mode and various other smaller changes:

Ahmed S. Darwish (1):
  IB/core: Use ARRAY_SIZE macro for mandatory_table

Akinobu Mita (1):
  IB/ehca: Fix memleak on module unloading

David Howells (1):
  IB/mthca: Work around gcc bug on sparc64

Michael S. Tsirkin (6):
  IPoIB: Connected mode experimental support
  IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
  IB/mthca: Give reserved MTTs a separate cache line
  IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
  IB/mthca: Merge MR and FMR space on 64-bit systems
  IB/mthca: Always fill MTTs from CPU

Roland Dreier (1):
  IB/mthca: Use correct structure size in call to memset()

Sean Hefty (2):
  RDMA/cma: Increment port number after close to avoid re-use
  IB: Remove redundant _wq from workqueue names

Steve Wise (1):
  RDMA/cxgb3: Add driver for Chelsio T3 RNIC

 drivers/infiniband/Kconfig |1 +
 drivers/infiniband/Makefile|1 +
 drivers/infiniband/core/addr.c |2 +-
 drivers/infiniband/core/cma.c  |   68 +-
 drivers/infiniband/core/device.c   |3 +-
 drivers/infiniband/hw/cxgb3/Kconfig|   27 +
 drivers/infiniband/hw/cxgb3/Makefile   |   12 +
 drivers/infiniband/hw/cxgb3/cxio_dbg.c |  207 +++
 drivers/infiniband/hw/cxgb3/cxio_hal.c | 1280 +++
 drivers/infiniband/hw/cxgb3/cxio_hal.h |  201 +++
 drivers/infiniband/hw/cxgb3/cxio_resource.c|  331 
 drivers/infiniband/hw/cxgb3/cxio_resource.h|   70 +
 drivers/infiniband/hw/cxgb3/cxio_wr.h  |  685 
 drivers/infiniband/hw/cxgb3/iwch.c |  189 +++
 drivers/infiniband/hw/cxgb3/iwch.h |  177 ++
 drivers/infiniband/hw/cxgb3/iwch_cm.c  | 2081 
 drivers/infiniband/hw/cxgb3/iwch_cm.h  |  223 +++
 drivers/infiniband/hw/cxgb3/iwch_cq.c  |  225 +++
 drivers/infiniband/hw/cxgb3/iwch_ev.c  |  231 +++
 drivers/infiniband/hw/cxgb3/iwch_mem.c |  172 ++
 drivers/infiniband/hw/cxgb3/iwch_provider.c| 1203 ++
 drivers/infiniband/hw/cxgb3/iwch_provider.h|  367 +
 drivers/infiniband/hw/cxgb3/iwch_qp.c  | 1007 
 drivers/infiniband/hw/cxgb3/iwch_user.h|   67 +
 drivers/infiniband/hw/cxgb3/tcb.h  |  632 +++
 drivers/infiniband/hw/ehca/ehca_irq.c  |2 +
 drivers/infiniband/hw/mthca/mthca_cmd.c|6 +-
 drivers/infiniband/hw/mthca/mthca_dev.h|2 +
 drivers/infiniband/hw/mthca/mthca_main.c   |   40 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c|  127 ++-
 drivers/infiniband/hw/mthca/mthca_memfree.h|9 +-
 drivers/infiniband/hw/mthca/mthca_mr.c |  110 ++-
 drivers/infiniband/hw/mthca/mthca_profile.c|2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c   |   14 +-
 drivers/infiniband/hw/mthca/mthca_provider.h   |1 +
 drivers/infiniband/hw/mthca/mthca_qp.c |2 +-
 drivers/infiniband/hw/mthca/mthca_srq.c|9 +-
 drivers/infiniband/ulp/ipoib/Kconfig   |   16 +-
 drivers/infiniband/ulp/ipoib/Makefile  |1 +
 drivers/infiniband/ulp/ipoib/ipoib.h   |  215 +++
 drivers/infiniband/ulp/ipoib/ipoib_cm.c| 1237 ++
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|   29 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   63 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |4 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |   40 +-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c  |2 +
 46 files changed, 11279 insertions(+), 114 deletions(-)
 create mode 100644 drivers/infiniband/hw/cxgb3/Kconfig
 create mode 100644 drivers/infiniband/hw/cxgb3/Makefile
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_dbg.c
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_hal.c
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_hal.h
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_resource.c
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_resource.h
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_wr.h
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch.h
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cm.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cm.h
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cq.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_ev.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_mem.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_provider.c
 create

Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree

2007-02-12 Thread Roland Dreier
OK, I already merged this but now I'm thinking it's somewhat buggy:

  +if (coherent)
  +ret = mthca_alloc_icm_coherent(dev-pdev-dev,
  +   
  chunk-mem[chunk-npages],
  +   cur_order, gfp_mask);
  +else
  +ret = mthca_alloc_icm_pages(chunk-mem[chunk-npages],
  +cur_order, gfp_mask);
   
  -if (++chunk-npages == MTHCA_ICM_CHUNK_LEN) {
  +if (!ret) {
  +++chunk-npages;
  +
  +if (!coherent  chunk-npages == MTHCA_ICM_CHUNK_LEN) {
   chunk-nsg = pci_map_sg(dev-pdev, chunk-mem,

I don't see anything that ever bumps chunk-nsg if we're allocating a
coherent region and we end up needing more than one allocation to do
it.  Maybe something like this on top of the patch?

diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 0b9d053..48f7c65 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -175,7 +175,9 @@ struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, 
int npages,
if (!ret) {
++chunk-npages;
 
-   if (!coherent  chunk-npages == MTHCA_ICM_CHUNK_LEN) {
+   if (coherent)
+   ++chunk-nsg;
+   else if (chunk-npages == MTHCA_ICM_CHUNK_LEN) {
chunk-nsg = pci_map_sg(dev-pdev, chunk-mem,
chunk-npages,
PCI_DMA_BIDIRECTIONAL);

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] iw_cxgb3 Change cxio semaphore to mutex.

2007-02-11 Thread Roland Dreier
Thanks, applied along with the following warning cleanup for archs
where u64 is unsigned long instead unsigned long long:

diff --git a/drivers/infiniband/hw/cxgb3/cxio_dbg.c 
b/drivers/infiniband/hw/cxgb3/cxio_dbg.c
index dfaa704..5a7306f 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_dbg.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_dbg.c
@@ -62,7 +62,7 @@ void cxio_dump_tpt(struct cxio_rdev *rdev, u32 stag)
 
data = (u64 *)m-buf;
while (size  0) {
-   PDBG(TPT %08x: %016llx\n, m-addr, (u64)*data);
+   PDBG(TPT %08x: %016llx\n, m-addr, (unsigned long long) 
*data);
size -= 8;
data++;
m-addr += 8;
@@ -100,7 +100,7 @@ void cxio_dump_pbl(struct cxio_rdev *rdev, u32 pbl_addr, 
uint len, u8 shift)
 
data = (u64 *)m-buf;
while (size  0) {
-   PDBG(PBL %08x: %016llx\n, m-addr, (u64)*data);
+   PDBG(PBL %08x: %016llx\n, m-addr, (unsigned long long) 
*data);
size -= 8;
data++;
m-addr += 8;
@@ -116,7 +116,8 @@ void cxio_dump_wqe(union t3_wr *wqe)
if (size == 0)
size = 8;
while (size  0) {
-   PDBG(WQE %p: %016llx\n, data, be64_to_cpu(*data));
+   PDBG(WQE %p: %016llx\n, data,
+(unsigned long long) be64_to_cpu(*data));
size--;
data++;
}
@@ -128,7 +129,8 @@ void cxio_dump_wce(struct t3_cqe *wce)
int size = sizeof(*wce);
 
while (size  0) {
-   PDBG(WCE %p: %016llx\n, data, be64_to_cpu(*data));
+   PDBG(WCE %p: %016llx\n, data,
+(unsigned long long) be64_to_cpu(*data));
size -= 8;
data++;
}
@@ -159,7 +161,7 @@ void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int 
nents)
 
data = (u64 *)m-buf;
while (size  0) {
-   PDBG(RQT %08x: %016llx\n, m-addr, (u64)*data);
+   PDBG(RQT %08x: %016llx\n, m-addr, (unsigned long long) 
*data);
size -= 8;
data++;
m-addr += 8;
diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c 
b/drivers/infiniband/hw/cxgb3/cxio_hal.c
index 19553b3..0531b94 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -298,7 +298,7 @@ int cxio_create_qp(struct cxio_rdev *rdev_p, u32 
kernel_domain,
wq-udb = (u64)rdev_p-rnic_info.udbell_physbase +
(wq-qpid  rdev_p-qpshift);
PDBG(%s qpid 0x%x doorbell 0x%p udb 0x%llx\n, __FUNCTION__,
-wq-qpid, wq-doorbell, wq-udb);
+wq-qpid, wq-doorbell, (unsigned long long) wq-udb);
return 0;
 err4:
kfree(wq-sq);
@@ -553,8 +553,8 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p)
wqe-ctx1 = cpu_to_be64(ctx1);
wqe-ctx0 = cpu_to_be64(ctx0);
PDBG(CtrlQP dma_addr 0x%llx workq %p size %d\n,
-(u64) rdev_p-ctrl_qp.dma_addr, rdev_p-ctrl_qp.workq,
-1  T3_CTRL_QP_SIZE_LOG2);
+(unsigned long long) rdev_p-ctrl_qp.dma_addr,
+rdev_p-ctrl_qp.workq, 1  T3_CTRL_QP_SIZE_LOG2);
skb-priority = CPL_PRIORITY_CONTROL;
return (cxgb3_ofld_send(rdev_p-t3cdev_p, skb));
 }
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c 
b/drivers/infiniband/hw/cxgb3/iwch_cq.c
index 3d7c96f..98b3bdb 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cq.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c
@@ -87,7 +87,7 @@ static int iwch_poll_cq_one(struct iwch_dev *rhp, struct 
iwch_cq *chp,
 lo 0x%x cookie 0x%llx\n, __FUNCTION__,
 CQE_QPID(cqe), CQE_TYPE(cqe),
 CQE_OPCODE(cqe), CQE_STATUS(cqe), CQE_WRID_HI(cqe),
-CQE_WRID_LOW(cqe), cookie);
+CQE_WRID_LOW(cqe), (unsigned long long) cookie);
 
if (CQE_TYPE(cqe) == 0) {
if (!CQE_STATUS(cqe))
diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c 
b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index 5909ec5..2b6cd53 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -163,7 +163,9 @@ int build_phys_page_list(struct ib_phys_buf *buffer_list,
((u64) j  *shift));
 
PDBG(%s va 0x%llx mask 0x%llx shift %d len %lld pbl_size %d\n,
-__FUNCTION__, *iova_start, mask, *shift, *total_size, *npages);
+__FUNCTION__, (unsigned long long) *iova_start,
+(unsigned long long) mask, *shift, (unsigned long long) 
*total_size,
+*npages);
 
return 0;
 
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index d02cd72..549de0a 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -213,7 +213,7 @@ static struct ib_cq *iwch_create_cq(struct ib_device 
*ibdev, int entries,

Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-10 Thread Roland Dreier
  IMO, probably worth it to init just this one field rather than use up
  initialized memory - and I think it's clearer.

What do you mean by using up initialized memory?  kzalloc() just does
a memset(0), and it's not like there's a limit on the number of times
we're allowed to call memset().

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for-2.6.21 Remove hw/cxgb3/core subdirectory.

2007-02-10 Thread Roland Dreier
Thanks, applied this and the previous patch, and pushed out my
for-2.6.21 branch.  I also rebased so the cxgb3 net driver builds now.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] integer overflow

2007-02-10 Thread Roland Dreier
   while ((int) priv-tx_tail - (int) priv-tx_head  0) {
  
  seems to rely on integer overflow which seems to be
  undefined behaviour.

tx_tail and tx_head are unsigned, and overflow is defined for unsigned
integers.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] integer overflow

2007-02-10 Thread Roland Dreier
  Yes but we cast them to signed int here - no?

That's true, I guess it is technically undefined.  But time_after() is
relying on the same thing working, so I would say we don't care.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] more comments on cxgb3

2007-02-08 Thread Roland Dreier
  diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
  b/drivers/infiniband/hw/cxgb3/iwch_provider.c
  index db2b0a8..98568ee 100644
  --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
  +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
  @@ -99,6 +99,7 @@ static int iwch_dealloc_ucontext(struct 
   struct iwch_dev *rhp = to_iwch_dev(context-device);
   struct iwch_ucontext *ucontext = to_iwch_ucontext(context);
   PDBG(%s context %p\n, __FUNCTION__, context);
  +free_mmaps(ucontext);
   cxio_release_ucontext(rhp-rdev, ucontext-uctx);
   kfree(ucontext);
   return 0;
  diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h 
  b/drivers/infiniband/hw/cxgb3/iwch_provider.h
  index 1ede8a7..c8c07ee 100644
  --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
  +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
  @@ -199,6 +199,21 @@ struct iwch_mm_entry {
   unsigned len;
   };
   
  +static inline void free_mmaps(struct iwch_ucontext *ucontext)
  +{
  +struct list_head *pos, *nxt;
  +struct iwch_mm_entry *mm;
  +
  +spin_lock(ucontext-mmap_lock);
  +list_for_each_safe(pos, nxt, ucontext-mmaps) {
  +mm = list_entry(pos, struct iwch_mm_entry, entry);
  +list_del(mm-entry);
  +kfree(mm);
  +}
  +spin_unlock(ucontext-mmap_lock);
  +return;
  +}

Since you only have one caller, I would suggest just open-coding the
deletion at the call-site (since that function is really too big to
inline if it ever grows another caller).  And I don't think you need
the locking either, since there better be no one else looking at the
context structure while you're in the process of freeing it.

Something like:

struct iwch_dev *rhp = to_iwch_dev(context-device);
struct iwch_ucontext *ucontext = to_iwch_ucontext(context);
struct iwch_mm_entry *mm, *tmp;

PDBG(%s context %p\n, __FUNCTION__, context);
list_for_each_entry_safe(mm, tmp, ucontext-mmaps)
kfree(mm);
cxio_release_ucontext(rhp-rdev, ucontext-uctx);
kfree(ucontext);
return 0;

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support

2007-02-08 Thread Roland Dreier
I merged the increment port number and remove redundant '_wq'
patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland

I plan to review to multicast stuff next week and I hope to merge it
for 2.6.21.  Or, have you or anyone else at Voltaire read over the
code in addition to using it?  Do you see anything that should be
cleaned up?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes

2007-02-08 Thread Roland Dreier
OK, I've pulled the cxgb3 stuff into a single commit in my for-2.6.21
branch.  I took the liberty of cleaning up some sparse warnings, etc.
There's still a few other obvious things to fix up:

drivers/infiniband/hw/cxgb3/iwch_ev.c:102:6: warning: symbol 'iwch_ev_disp
atch' was not declared. Should it be static?

  Rather than putting an extern in iwch.c, please put a proper
  definition in an appropriate header file included from iwch.c.

Also I agree with MST, I would like to see the core/ subdirectory die
completely.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes

2007-02-08 Thread Roland Dreier
Oh yeah -- Steve, please keep sending cleanup patches based on my tree
now.  I'm planning on asking Linus to merge what's in for-2.6.21 in
the next couple of days, but there's still more than a week before the
merge window closes, and even after the merge window closes I'll still
accept fixes/cleanups for stuff already upstream.

And here's what I have pending in for-2.6.21 so far:

Ahmed S. Darwish (1):
  IB/core: Use ARRAY_SIZE macro for mandatory_table

Akinobu Mita (1):
  IB/ehca: Fix memleak on module unloading

David Howells (1):
  IB/mthca: Work around gcc bug on sparc64

Michael S. Tsirkin (1):
  IPoIB: Connected mode experimental support

Roland Dreier (1):
  IB/mthca: Use correct structure size in call to memset()

Sean Hefty (2):
  RDMA/cma: Increment port number after close to avoid re-use
  IB: Remove redundant _wq from workqueue names

Steve Wise (1):
  RDMA/cxgb3: Add driver for Chelsio T3 Rnic

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()

2007-02-08 Thread Roland Dreier
BTW, while looking at iwcm.c, I noticed the following highly dubious
code for the first time:

static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
{
int ret = 0;

BUG_ON(atomic_read(cm_id_priv-refcount)==0);
if (atomic_dec_and_test(cm_id_priv-refcount)) {
BUG_ON(!list_empty(cm_id_priv-work_list));
if (waitqueue_active(cm_id_priv-destroy_comp.wait)) {
BUG_ON(cm_id_priv-state != 
IW_CM_STATE_DESTROYING);
BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
cm_id_priv-flags));
ret = 1;
}
complete(cm_id_priv-destroy_comp);
}

return ret;
}

The test of waitqueue_active on destroy_comp.wait looks really bad for
two reasons: first, it is relying on an internal implementation detail
of struct completion that really shouldn't be used by generic code.
And second, it seems to me that this doesn't even work right, since
there is a race something like the following:

iw_destroy_cm_id():
destroy_cm_id(cm_id); // still 1 ref left

cm_work_handler():
if (iwcm_deref_id()) // drop last ref
return;
// no one waiting yet, doesn't
// return, but destroy_comp is
// signaled

wait_for_completion(cm_id_priv-destroy_comp);
// destroy_comp is signaled, proceed
kfree(cm_id_priv);

// continue using cm_id_priv
// OOPS

I don't understand this code well enough for the fix to be obvious.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Problem with SRP with 512 byte sector size with 2 TB LUNs

2007-02-07 Thread Roland Dreier
  Is it possible to add LUNs with  2 TB and 512 byte sectors ?
  Why does the READ CAPACITY(16) comand fail ?

It seems that the DDN target is not reporting good information -- I
don't see anything obviously wrong in what the kernel is doing (now
that SRP sends a READ CAPACITY command).  Do you know if the same type
of config works over fibre channel?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support

2007-02-07 Thread Roland Dreier
  Well, randomness is a resource after all, and since we don't have the 
  additional
  security provided by PSNs in IPoIB UD, it seemed we do not need it for
  IPoIB CM either. So maybe the right thing is just to remove the FIXME 
  comment.

random32() doesn't use up any entropy. Random PSNs help avoid problems
with stale connections, so I think we should do it.

I noticed some funny code in ipoib_cm_skb_reap():

__be32 mtu = cpu_to_be32(priv-mcast_mtu);

// htonl(__be32)??
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, 
htonl(mtu));
// no htonl() here -- is this correct?
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);

what is the right thing?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support

2007-02-07 Thread Roland Dreier
   I noticed some funny code in ipoib_cm_skb_reap():
   
  __be32 mtu = cpu_to_be32(priv-mcast_mtu);
   
   // htonl(__be32)??
  icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, 
   htonl(mtu));
   // no htonl() here -- is this correct?
  icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);
   
   what is the right thing?
  
  Both are right I think.

You're right -- the mistake is making mtu __be32 and preswapping it.
I'll fix it up in my tree.

  These two functions seem to accept parameters in different format:
  
  include/net/icmp.h:extern void  icmp_send(struct sk_buff *skb_in,  int type, 
  int
 code, __be32 info);
  
  
  include/linux/icmpv6.h:extern voidicmpv6_send(struct sk_buff 
  *skb,
  include/linux/icmpv6.h-   int type, int 
  code,
  include/linux/icmpv6.h-   __u32 info,
  include/linux/icmpv6.h-   struct 
  net_device *dev);
  
  BTW, I just looked at ip_gre.c and it has the same code.

no, it leaves mtu as an int rather than swapping it.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets

2007-02-07 Thread Roland Dreier
  I was going to resend it after Roland's earlier patch to clean up the 
  ib_init_ah_from_path was accepted..

Sorry, I started having second thoughts about the part about changing
it to return void (it seems more sensible to check it the other places
it's called).  But I'll look at that again soon.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Immediate data question

2007-02-07 Thread Roland Dreier
Changqing  Does this pending SEND_WITH_IMM message affect the
Changqing performance of the receiver process ? Is this message
Changqing buffered in the receiver's HCA, or the sender retry and
Changqing get RNR ack until receiver posts a receive ?

If no receive is pending, then the responder sends an RNR NAK and the
sender will wait for the RNR timeout and retry, etc.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets

2007-02-07 Thread Roland Dreier
  I've started thinking about what it would take to get the rdma cm to
  work across a router.  I think the rdma cm may need to treat IPv6
  addresses as a GID for this to work across subnets, versus trying to
  map an ipoib IP address to a GID based on ARP.

Hmm, why is that?  Shouldn't IPoIB work through a router, and
correctly get the GID of the final destination via ARP just fine?

If the RDMA CM treats IPv6 addresses as GIDs, then this breaks things
on a normal subnet with IPoIB interfaces configured with IPv6 addresses.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Immediate data question

2007-02-07 Thread Roland Dreier
Changqing What I mean is that, is there any performance penalty
Changqing for receiver's overall performance if RNR happens
Changqing continuously on one of the QP ?

Not for the receiver, but the sender will be severely slowed down by
having to wait for the RNR timeouts.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets

2007-02-07 Thread Roland Dreier
Jason Basically, if IB routers are used, and the IPoIB feature of
Jason *not* spanning a subnet is used (for scalabililty?) then
Jason you need an alternate way to specify addresses to rdma cm.

You mean if the IB router is also an IP router for IPoIB?

Then I think there are some serious semantic problems to solve for the
RDMA CM -- because you are using an IP address to define a
destination, but since that address is on the other side of an IP
router, there's no way to know it even belongs to an IB port.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] sharing qp between user and kernel

2007-02-07 Thread Roland Dreier
Pete Before I dig into this anymore, do you expect this to work?
Pete Are there fundamental problems with QP sharing between user
Pete and kernel?  It would sure be nice not to have to stick the
Pete connection management aspects into the kernel.

No, I wouldn't expect this to work.  At first glance at least, yes,
there are fundamental problems.  Sharing a QP between user and
kernelspace, where userspace is doing full kernel bypass (as eg mthca
does -- there are NO system calls when doing post work request, poll
CQ and request CQ notification operations), seems like a huge
problem.  I don't see any way that the kernel can keep a consistent
view of the QP state unless userspace has to call into the kernel for
every operation, which would kill performance.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



  1   2   3   4   5   6   7   8   9   10   >