Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering
I just gave this a cursory glance. I haven't really read it except to think why is this so complicated? A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey change? Changing the P_Key index is not allowed for RTS-RTS. You would have to modify the QP RTS-SQD, wait for the SQ to drain, then modify the P_Key index with SQD-SQD, and finally go SQD-RTS. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.
I did a short code review of the ipoib code concentrating on partitioning support and I mentioned that the asynchronous events handler in the ipoib code does not take the port number reported in the event record into consideration. The effect of that is that all of the ib# devices related to that specific HCA are flushed when it seems to me that only the relevant port one should be. Is that done on purpose, or am I missing something ? I don't think there's any particular reason the code is that way except for the oversight never being corrected. But it looks trivial to fix, like the patch below. Does that look right to you? p.s. I'm working on a patch that should solve another issue caused by PKEY reordering ipoib behavior and the above issue further complicates things for me. Why not fix the issue first then? commit a27cbe878203076247c1b5287f5ab59ed143b560 Author: Roland Dreier [EMAIL PROTECTED] Date: Tue Feb 27 07:37:49 2007 -0800 IPoIB: Only handle async events for one port An asynchronous event carries the port number that the event occurred on, so there's no reason for an IPoIB interface to process an event associated with a different local HCA port. Signed-off-by: Roland Dreier [EMAIL PROTECTED] diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 3cb551b..7f3ec20 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler, struct ipoib_dev_priv *priv = container_of(handler, struct ipoib_dev_priv, event_handler); - if (record-event == IB_EVENT_PORT_ERR|| - record-event == IB_EVENT_PKEY_CHANGE || - record-event == IB_EVENT_PORT_ACTIVE || - record-event == IB_EVENT_LID_CHANGE || - record-event == IB_EVENT_SM_CHANGE || - record-event == IB_EVENT_CLIENT_REREGISTER) { + if ((record-event == IB_EVENT_PORT_ERR|| +record-event == IB_EVENT_PKEY_CHANGE || +record-event == IB_EVENT_PORT_ACTIVE || +record-event == IB_EVENT_LID_CHANGE || +record-event == IB_EVENT_SM_CHANGE || +record-event == IB_EVENT_CLIENT_REREGISTER) + record-element.port_num == priv-port) { ipoib_dbg(priv, Port state change event\n); queue_work(ipoib_workqueue, priv-flush_task); } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering
I haven't really read it except to think why is this so complicated? Do you refer to that complication of the patch of the issue ? the patch. Changing the P_Key index is not allowed for RTS-RTS. You would have to modify the QP RTS-SQD, wait for the SQ to drain, then modify the P_Key index with SQD-SQD, and finally go SQD-RTS. Do you think that using that way to solve it will be a significant simplification ? We'll still have to reuse that handling for missed completion that is currently implemented in ipoib_ib_dev_stop and still have additional work element. no, I don't think SQD is really useful in practice. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered without port parameter.
On a second thought based on the fact that on a two port HCA we'll have a 50% miss on the events being delivered, I would move the new condition to be evaluated first. I apologize if this is too much of micro optimization. What do you think ? That wouldn't really be correct since element.port_num isn't valid unless we already know it's a port-related event. And it's not worth worrying about this since it's not remotely a hot path. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/BUG] DMA vs. CQ race
On our cell blade + PCI-e Mellanox. I don't see anything in arch/powerpc that looks like dma_alloc_coherent() will do anything other than allocate some memory and map it with DMA_BIDIRECTIONAL. So how does this altix fix help in your situation? Am I misreading the Cell IOMMU code? Shirley, can you clarify why doing dma_alloc_coherent() in the kernel helps on your Cell blade? It really seems that dma_alloc_coherent() just allocates some memory and then does dma_map(DMA_BIDIRECTIONAL), which would be exactly the same as allocating the CQ buffer in userspace and using ib_umem_get() to map it into the kernel. I'm looking at a possibly cleaner solution to the Altix issue, so I would like to make sure it fixes whatever the bug on Cell is as well. So any details you can provide about the problem you see on Cell would help a lot. Thanks... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish
I don't think this applies any more since Sean's multicast stuff was merged. I didn't realize you wanted to get this merged upstream -- anyway, can you please regenerate the patch against the latest kernel? Thanks ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPOIB NAPI
So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can generate the patch for all ULPs to use this for review. Do you need me to do that? No, it's not in OFED 1.2 or the upstream kernel. And no one has implemented it for userspace (and I'm somewhat reluctant to break the ABI at this point without some performance numbers to motivate making this API change). Have the NAPI performance problems with ehca been resolved? We could probably merge IPoIB NAPI for 2.6.22 then, which would pull in the kernel changes at least. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
nope, doesn't seem to make a difference. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [GIT PULL] please pull infiniband.git
Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get various post-rc1 cleanups and fixes: Adrian Bunk (2): IB/mthca: Make 2 functions static RDMA/cxgb3: cleanups Michael S. Tsirkin (1): IPoIB/cm: Improve small message bandwidth Roland Dreier (3): IPoIB: Remove unused local_rate tracking IB/uverbs: Return correct error for invalid PD in register MR IPoIB: Correct debugging output when path record lookup fails Sean Hefty (4): IB/core: Set hop limit in ib_init_ah_from_wc correctly RDMA/cma: Request reversible paths only IB/cm: Remove ca_guid from cm_device structure RDMA/cma: Remove unused node_guid from cma_device structure Steve Wise (1): RDMA/cxgb3: Stop the EP Timer on BAD CLOSE drivers/infiniband/core/cm.c | 10 ++--- drivers/infiniband/core/cma.c |6 ++-- drivers/infiniband/core/uverbs_cmd.c |4 ++- drivers/infiniband/core/verbs.c|2 +- drivers/infiniband/hw/cxgb3/Makefile |1 - drivers/infiniband/hw/cxgb3/cxio_hal.c | 31 +--- drivers/infiniband/hw/cxgb3/cxio_hal.h |5 --- drivers/infiniband/hw/cxgb3/cxio_resource.c| 14 +-- drivers/infiniband/hw/cxgb3/iwch_cm.c |6 ++-- drivers/infiniband/hw/cxgb3/iwch_provider.c|2 +- drivers/infiniband/hw/cxgb3/iwch_provider.h|1 - drivers/infiniband/hw/cxgb3/iwch_qp.c | 29 +++ drivers/infiniband/hw/mthca/mthca_mr.c | 10 +++-- drivers/infiniband/ulp/ipoib/ipoib.h |1 - drivers/infiniband/ulp/ipoib/ipoib_cm.c| 46 ++-- drivers/infiniband/ulp/ipoib/ipoib_main.c |2 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |8 ++--- 17 files changed, 76 insertions(+), 102 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index d446998..842cd0b 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -88,7 +88,6 @@ struct cm_port { struct cm_device { struct list_head list; struct ib_device *device; - __be64 ca_guid; struct cm_port port[0]; }; @@ -739,8 +738,8 @@ retest: ib_cancel_mad(cm_id_priv-av.port-mad_agent, cm_id_priv-msg); spin_unlock_irqrestore(cm_id_priv-lock, flags); ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT, - cm_id_priv-av.port-cm_dev-ca_guid, - sizeof cm_id_priv-av.port-cm_dev-ca_guid, + cm_id_priv-id.device-node_guid, + sizeof cm_id_priv-id.device-node_guid, NULL, 0); break; case IB_CM_REQ_RCVD: @@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg, req_msg-local_comm_id = cm_id_priv-id.local_id; req_msg-service_id = param-service_id; - req_msg-local_ca_guid = cm_id_priv-av.port-cm_dev-ca_guid; + req_msg-local_ca_guid = cm_id_priv-id.device-node_guid; cm_req_set_local_qpn(req_msg, cpu_to_be32(param-qp_num)); cm_req_set_resp_res(req_msg, param-responder_resources); cm_req_set_init_depth(req_msg, param-initiator_depth); @@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg, cm_rep_set_flow_ctrl(rep_msg, param-flow_control); cm_rep_set_rnr_retry_count(rep_msg, param-rnr_retry_count); cm_rep_set_srq(rep_msg, param-srq); - rep_msg-local_ca_guid = cm_id_priv-av.port-cm_dev-ca_guid; + rep_msg-local_ca_guid = cm_id_priv-id.device-node_guid; if (param-private_data param-private_data_len) memcpy(rep_msg-private_data, param-private_data, @@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device) return; cm_dev-device = device; - cm_dev-ca_guid = device-node_guid; set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask); for (i = 1; i = device-phys_port_cnt; i++) { diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f8d69b3..d441815 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -77,7 +77,6 @@ static int next_port; struct cma_device { struct list_headlist; struct ib_device*device; - __be64 node_guid; struct completion comp; atomic_trefcount; struct list_headid_list; @@ -1492,11 +1491,13 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, ib_addr_get_dgid(addr, path_rec.dgid); path_rec.pkey = cpu_to_be16(ib_addr_get_pkey
Re: [openib-general] [RFC/BUG] DMA vs. CQ race
That would be great. We hit a similar problem in our cluster test -- data corruption because of this race. On what platform? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/BUG] DMA vs. CQ race
On our cell blade + PCI-e Mellanox. I don't see anything in arch/powerpc that looks like dma_alloc_coherent() will do anything other than allocate some memory and map it with DMA_BIDIRECTIONAL. So how does this altix fix help in your situation? Am I misreading the Cell IOMMU code? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPOIB NAPI
Yes. It would be good to reduce number of interrupts by changing all upper layer protocols to use: poll CQ notify CQ, rotting packet notification poll again instead of notify CQ poll CQ If possible this can be in OFED-1.2? No way, it's way too late at this point to change the kernel-user ABI, let alone change all ULPs. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] failure to create an FMR mapping 1K pages on memfree
I have got a report on failure to create FMR mapping 1K pages (that is 4MB) on memfree. I don't have the exact details (ie if Arbel/Sinai / what FW / etc) nor which exact check fails in mthca_fmr_alloc, but what's clear is that the latter function returns -ENOMEM when attr.max_pages is 1024 and it works fine when attr.max_pages is 256. Is this failure clear to you? if yes, does a HW or FW limit is being hit or its a driver design issue? Is it really returning -ENOMEM? It seems much more likely that you are hitting the code /* For Arbel, all MTTs must fit in the same page. */ if (mthca_is_memfree(dev) mr-attr.max_pages * sizeof *mr-mem.arbel.mtts PAGE_SIZE) return -EINVAL; I guess you could call this limit a driver design issue. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter
Newer gccs have the -fwhole-program --combine options that address this and more. One of the things that happens is that all internal functions are made 'static' and all compilation units are optimized in one go. Good point... but is there any sane way to use that feature with automake and libtool? I know that the autotools are a pain but I really don't want to reimplement the useful stuff they give us, and I don't know of any really practical replacement... - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: cleanups
thanks, queued for 2.6.21 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 2.6.21] iw_cxgb3: Stop the EP Timer on BAD CLOSE.
thanks, queued for 2.6.21 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v2] libibverbs: can't compile more than once due to man3 symbolic links
Thanks, I applied this and pushed it out. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/BUG] DMA vs. CQ race
A first-cut at a patch was sent out, some very reasonable objections were raised, and the thread fizzled out. Sorry, I meant to respond again, but I never got around to it. The biggest concern with the earlier patch seemed to be backward compatibility. There was a stab at addressing that in http://tinyurl.com/2x3s52, but no commentary. (Too ugly for words?) I think you went off into the weeds there, but I'll respond to that earlier email in detail. Any suggestions as to how to proceed? Should I just code something up in order to have a concrete target to discuss? Or are there any new thoughts based on the previous emails? I actually have a vague plan for a somewhat cleaner way to get this fix. For a variety of reasons, I am planning on changing the way the kernel handles memory registration so that low-level drivers have more control over what happens. This would allow us to folow Gleb's suggestion to use register MR to create and map the kernel's buffer and avoid some of the error path ugliness. So I would prefer to map the coherent memory that way. However this will take a while to come to fruition, since it is kind of a background task for me. How severe is this issue? In other words, when you produced the problem, was it a synthetic test, or a workload that someone might actually want to run? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] mthca adjust_key()
Could anyone tell me why this routine in mthca is necessary? There aren't any comments to explain it; I'm wondering if this is a workaround for Sinai of some kind? static inline u32 adjust_key(struct mthca_dev *dev, u32 key) { if (dev-mthca_flags MTHCA_FLAG_SINAI_OPT) return ((key 20) 0x80) | (key 0x7f); else return key; } It's a performance optimization for Sinai. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/BUG] libibverbs: DMA vs. CQ race
Assuming that something along the lines of the previous patch is used, we need to address userspace/kernel compatibility. The existing abi versioning doesn't seem to be exactly what we want to use, though, because we want to change a verb's semantics to work around a bug. (Changing the abi_version may be an inevitable result, though.) How about adding semantic flags to the mthca_* commands (mthca_create_cq, etc.)? Userspace could read the contents of a new sysfs file which, if found, would indicate the flags that the kernel understands. Then it could pass the flags, if it chooses, to get the kernel to use the desired semantics. This is not really the design philosophy that we've used in defining the user-kernel interfaces for IB verbs. Rather than having complexity in the kernel to handle both old and new ways of doing things, the way we've used to handle cases like this is the following: - specify new fixed ABI (in this case, mthca abi_version 2) - update library to handle old and new ABI (in this case, update libmthca to use mthca kernel abi 1 or 2 depending on what it detects at runtime) - update kernel to implement new ABI, and remove old ABI from kernel (in this case, update kernel mthca driver to abi_version 2) The net effect of this is that updated userspace works fine with any kernel, but updating the kernel will require updating userspace libraries too. However the important point is that once userspace is updated, it's still possible to boot into old kernels and have things work without downgrading userspace. If we really wanted to export some flags from mthca back to libmthca, I guess it would be possible to bump the abi version and add a flags field to the response to the alloc_ucontext command, but in this case I don't see a reason to worry about it. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC/BUG] DMA vs. CQ race
We found this accidentally, running a normal MPI job, on a normally sized machine (i.e., tens, not hundreds of processors.) It appears to be more easily produced that we'd expected, and we consider it to be a severe problem. Hmm, OK. Then I will do my best to make sure we get a fix for this into 2.6.22. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
1. Is there something special you do when you run the benchmark (msi, taskset, ...)? Yes, I am using MSI-X, and I pin the interrupt handler to one CPU (CPU#0 in my particular case). Then I use taskset to pin the NPtcp process to a CPU in a different package (CPU#2 in my system). BTW with these same systems, I am getting up to ~1150 MB/sec of throughput with DDR mem-free Arbel, as measured with NPtcp. 2. On a wild guess that the issue here is higher interrupt rate with CM, is there a chance you could test the following patch posted by me earlier? http://www.mail-archive.com/openib-general@openib.org/msg29290.html OK, I'll try that when I get a chance. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
OK, I applied the following patch (I had to change one line of your patch to get it to apply because the small-message changed the context so one chunk didn't apply). Anyway I don't see any difference in small message latency or large message throughput. (Actually latency seems slightly worse but I think the change is within my normal variability so I'm don't think the difference is significant) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 2594db2..20d7ad4 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -98,9 +98,9 @@ enum { #defineIPOIB_OP_RECV (1ul 31) #ifdef CONFIG_INFINIBAND_IPOIB_CM -#defineIPOIB_CM_OP_SRQ (1ul 30) +#defineIPOIB_OP_CM (1ul 30) #else -#defineIPOIB_CM_OP_SRQ (0) +#defineIPOIB_OP_CM (0) #endif /* structs */ @@ -143,7 +143,6 @@ struct ipoib_cm_rx { struct ipoib_cm_tx { struct ib_cm_id *id; - struct ib_cq*cq; struct ib_qp*qp; struct list_head list; struct net_device *dev; @@ -232,6 +231,7 @@ struct ipoib_dev_priv { unsigned tx_tail; struct ib_sgetx_sge; struct ib_send_wrtx_wr; + unsigned tx_outstanding; struct ib_wc ibwc[IPOIB_NUM_WC]; @@ -438,6 +438,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx); void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb, unsigned int mtu); void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc); +void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc); #else struct ipoib_cm_tx; @@ -526,6 +527,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *w { } +static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) +{ +} #endif #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 3484e8b..9515ef6 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -82,7 +82,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id) struct ib_recv_wr *bad_wr; int i, ret; - priv-cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ; + priv-cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV; for (i = 0; i IPOIB_CM_RX_SG; ++i) priv-cm.rx_sge[i].addr = priv-cm.srq_ring[id].mapping[i]; @@ -344,7 +344,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space, void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc-wr_id ~IPOIB_CM_OP_SRQ; + unsigned int wr_id = wc-wr_id ~(IPOIB_OP_CM | IPOIB_OP_RECV); struct sk_buff *skb, *newskb; struct ipoib_cm_rx *p; unsigned long flags; @@ -436,7 +436,7 @@ static inline int post_send(struct ipoib_dev_priv *priv, priv-tx_sge.addr = addr; priv-tx_sge.length = len; - priv-tx_wr.wr_id = wr_id; + priv-tx_wr.wr_id = wr_id | IPOIB_OP_CM; return ib_post_send(tx-qp, priv-tx_wr, bad_wr); } @@ -487,20 +487,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ dev-trans_start = jiffies; ++tx-tx_head; - if (tx-tx_head - tx-tx_tail == ipoib_sendq_size) { + if (++priv-tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, TX ring 0x%x full, stopping kernel net queue\n, tx-qp-qp_num); netif_stop_queue(dev); - set_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags); } } } -static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx, - struct ib_wc *wc) +void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc-wr_id; + struct ipoib_cm_tx *tx = wc-qp-qp_context; + unsigned int wr_id = wc-wr_id ~IPOIB_OP_CM; struct ipoib_tx_buf *tx_req; unsigned long flags; @@ -525,11 +524,10 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx spin_lock_irqsave(priv-tx_lock, flags); ++tx-tx_tail; - if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags)) - tx-tx_head - tx-tx_tail = ipoib_sendq_size 1) { - clear_bit(IPOIB_FLAG_NETIF_STOPPED, tx-flags); + if (unlikely(--priv-tx_outstanding == ipoib_sendq_size 1) + netif_queue_stopped(dev) + test_bit(IPOIB_FLAG_ADMIN_UP, priv-flags))
Re: [openib-general] IPOIB NAPI
By the way, how about extending the userspace API in a similiar fashion? missed_events = ibv_req_notify_cq(priv-cq, IBV_CQ_NEXT_COMP | IBV_CQ_REPORT_MISSED_EVENTS) It would require a kernel-user ABI bump. Is it worth it? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland
These all look fine, I'll queue them up. Signed-off-by: Sean Hefty [EMAIL PROTECTED] I notice that the actual patches you committed don't have your sign-off in the git changelog. I assume this is a mistake so I'll add it back in... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland
I notice that the actual patches you committed don't have your sign-off in the git changelog. I assume this is a mistake so I'll add it back in... which means I can't just pull your branch. But that's OK, still doing git format-patch, edit patches, git am is pretty easy. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland
The patches are in git.openfabrics.org/~shefty/rdma-dev.git, for-roland branch, which is based on 2.6.21-rc1. One other request: please include a URL that I can just copy and paste, so I don't actually have to read and parse complete sentences. Something like: the patches are in git://git.openfabrics.org/~shefty/rdma-dev.git for-roland - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git for-roland
Anyway, all 4 queued up in my for-2.6.21 branch ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] libmthca: optimize calls to htonl with constant parameter
GCC seems to be unable to propogate constants across calls to htonl. So it turns out to be worth the while to replace htonl with a hand-written macro in case of constant parameter. I'm wondering why this helps you. On my system (which has Debian's old glibc 2.3.6, certainly nothing particularly fancy), I see in my netinet/in.h: /* Get machine dependent optimized versions of byte swapping functions. */ #include bits/byteswap.h #ifdef __OPTIMIZE__ /* We can optimize calls to the conversion functions. Either nothing has to be done or we are using directly the byte-swapping functions which often can be inlined. */ # if __BYTE_ORDER == __BIG_ENDIAN //... # else # if __BYTE_ORDER == __LITTLE_ENDIAN # define ntohl(x) __bswap_32 (x) and so on (and gcc defines __OPTIMIZE__ if you pass it any -O flag including -Os). And in bits/byteswap.h I have /* Swap bytes in 32 bit value. */ #define __bswap_constant_32(x) \ x) 0xff00) 24) | (((x) 0x00ff) 8) | \ (((x) 0xff00) 8) | (((x) 0x00ff) 24)) and variations of __bswap_32() that look roughly like # define __bswap_32(x) \ (__extension__ \ ({ register unsigned int __v, __x = (x); \ if (__builtin_constant_p (__x)) \ __v = __bswap_constant_32 (__x); \ else \ and so on. (The point of all this being that for constants, htonl() should expand to roughly the same thing as your CONSTANT_HTONL() -- the only difference is that you don't have the for the 24 and 24 parts, which I guess just has the potential to bite us if someone did something like CONSTANT_HTONL(1L) on a 64-bit system). As a quick test I compiled the code #include netinet/in.h enum { Y = 5 }; uint32_t foo(uint32_t x) { return x | htonl(Y); } with gcc -c -O and the disassembly of foo() looks like foo: 0: 89 f8 mov%edi,%eax 2: 0d 00 00 00 05 or $0x500,%eax 7: c3 retq and so everything works exactly the way we would want. (32-bit i386 also just does or with a constant too). In fact for libmthca I just checked that the preprocessor output of places like the following (which your patch converts) ((wr-send_flags IBV_SEND_SIGNALED) ? htonl(MTHCA_NEXT_CQ_UPDATE) : 0) | is ((wr-send_flags IBV_SEND_SIGNALED) ? (__extension__ ({ register unsigned int __v, __x = (MTHCA_NEXT_CQ_UPDATE); if (__builtin_constant_p (__x)) __v = __x) 0xff00) 24) | (((__x) 0x00ff) 8) | (((__x) 0xff00) 8) | (((__x) 0x00ff) 24)); else __asm__ (bswap %0 : =r (__v) : 0 (__x)); __v; })) : 0) | And if I compare the generated assembly for libmthca with and without your patch (on both x86-64 and i386), I don't see any significant difference (the size is exactly the same, I just see things like the compiler using eax and edx in the opposite order and trivial things like that). So what is different in your setup that causes this patch to make a difference for you? (BTW, one thing I did notice while looking at the i386 assembly is that one micro-optimization that might make sense to use something like __attribute__((regparm(3))) for internal function calls within libibverbs and libmthca on i386, since otherwise we waste instructions pushing stuff on the stack for no reason other than compliance with the crufty old i386 ABI. Something like a FASTCALL macro in infiniband/arch.h perhaps... if anyone really cares about 32-bit i386 performance any more) - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] I created a git tree for the libibverbs man pages
What is the Max # of cards OFED driver/library can support on a single node ? The lowest limit I know of is the # of device minors available for /dev/infiniband/uverbs files, which is 32. How many devices are you interested in supporting? This limit could probably be increased without too much trouble, but I doubt any realistic system will run into it anyway. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Fork issues with simple MPI program
As replied before - if you want full fork support you need to change the application. Look at the verbs header for details. Or you could try setting the IBV_FORK_SAFE environment variable before running your application. I guess for MPI jobs you need to make sure that environment variable is propagated to every process. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Fork issues with simple MPI program
If you can send me the details (since you implemented it) I will add it to the Wiki An application that wants fork() to work with libibverbs should either call ibv_fork_init() before doing anything else with libibverbs, or else a user can set the IBV_FORK_SAFE or RDMAV_FORK_SAFE environment variable to get the same effect. There is some overhead to making fork() work so it is not enabled by default. This is described in the ibv_fork_init manpage in the latest libibverbs git tree. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Fork issues with simple MPI program
Does this require 2.6.16 or better kernel support? The kernel must support the MADV_DONTFORK flag to madvise(), not sure when exactly that was merged but 2.6.16 or so sounds right. ibv_fork_init() will return an error if the kernel support is missing and fork safety won't actually work. And if you use the environment variable a warning will be printed if ibv_fork_init() fails. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] I created a git tree for the libibverbs man pages
I merged all these manpages into my libibverbs tree and pushed the result out to kernel.org. Please send any future updates as diffs against the libibverbs tree. Thanks, Roland ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
Thanks, queued for 2.6.21. With this patch I see small-packet latency down almost all the way back to what datagram mode gives -- on a pair of fast woodcrest systems I see latencies for netpipe tcp 1 byte messages like datagram 13.xx original CM 17.xx patched CM 14.xx so there is still a measurable difference but it is much less now. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: possible cleanups
You could just remove the code instead of #if 0... Steve, can you decide what the right thing to do with these changes is and send me the result (or just tell me to apply Adrian's patch as-is)? Thanks, Roland ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [2.6 patch] infiniband/hw/mthca/mthca_mr.c: make 2 functions static
Queued for my next merge, thanks. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path()
In issue number 296 that i opened several months ago in the Bugzilla, i reported about two missing attributes: the first one is the static_rate, and the second one is the src_path_bits which is not being filled right. The patch I posted fixes the static rate, right? You'll need to explain what you mean about src_path_bits, because at first glance the code looks OK to me. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow
We have a customer issue regarding IPv6oIB. In the subnet, there are limited number of MCGs supported. So when there are multiple IPv6 addresses are assigned to one interface, each IPv6 address will have one unique solicited-node address (depends on their groupID). Then in a large subnet, we will have tons of MCGs. If IPv6 solicited node addresses exceed the number of MDGs in this subnet, then IPv6 neighbour discovery will be broken, this won't happen in Ethernet since sendonly doesn't require sender to be joined any MCG. I have done an initial patch to addresss MCG overflow problem and redirect the solicited-node address to all hosts node address, thus IPv6 neighbour discovery will work no matter how many IPv6 addresses in this subnet. This patch is only triggered with IPv6 enabled and MGC overflows, so there is almost no performance penalty. I really don't like this approach, since it can break things in very subtle ways (eg suppose one node fails to join its solicited node group, but then a later node wants to talk to it and succeeds in joining the solicited node group as a send-only member -- since the first node is not a member then it will never see the ND messages). I much prefer to fix the SM not to impose too-low limits on the number of MCGs. Supporting O(# nodes) MCGs is really not a very onerous requirement on the SM. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH for-2.6.21] IB/ipoib: error handling thinko fix
Thanks, queued for 2.6.21. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow
Even SM supports 1000 MCGs, it's still not sufficitent for 250 nodes cluster, each node have 4 links for IPv6 without any scope/global IPv6 address configured.(250*4+ a few default MCGs) There will be a MCG overflow problem anyway in IPv6oIB. But what's the problem with supporting 1000 or even 1 MCGs? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow
I much prefer to fix the SM not to impose too-low limits on the number of MCGs. Supporting O(# nodes) MCGs is really not a very onerous requirement on the SM. Is this a MFT size issue or SM issue or both ? Well as we discussed before, the size of the MFT is really independent of the # of MCGs supported. It's up to the SM how to allocate MLIDs, and as long as all the switches in the fabric support at least one MLID, then any number of MCGs can be managed by the SM. So I would say this is entirely an SM issue. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SA multicast patches
The pkey is the default partition, full membership pkey. I believe all nodes will have either 0x or 0x7fff as their pkey. We could probably call ib_get_cached_pkey() instead and just use the first entry in the table. Well the consumer has to know what P_Key to use since it must match the QP that will be used to send/receive. So I would suggest not trying to guess in the low-level multicast.c code, and rely on the consumer to set it properly. We don't want to to set the privileged bit of the q_key, so that's wrong. Good catch. OK, I'll replace the code with something like random32() 0x7fff One other question about the PS_IPOIB stuff: +static int cma_set_qkey(struct ib_device *device, u8 port_num, +enum rdma_port_space ps, +struct rdma_dev_addr *dev_addr, u32 *qkey) +{ +struct ib_sa_mcmember_rec rec; +int ret = 0; + +switch (ps) { +case RDMA_PS_UDP: +*qkey = RDMA_UDP_QKEY; +break; +case RDMA_PS_IPOIB: +ib_addr_get_mgid(dev_addr, rec.mgid); +ret = ib_sa_get_mcmember_rec(device, port_num, rec.mgid, rec); +*qkey = be32_to_cpu(rec.qkey); +break; Does this work if userspace tries to join a new IPoIB MCG that the kernel driver hasn't joined yet? From reading the code it seems that ib_sa_get_mcmember_rec() would fail with -EADDRNOTAVAIL and so the whole join request would fail. Am I reading this correctly? Is it supposed to work? I would think that it would be nice to be able to receive on IPoIB MCGs not also being received by the kernel. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow
I thought that mapping multiple MCGs to the same MLID requires that a set of the (group) parameters are the same. Is that the case for these IPv6 groups ? Is the only variable in those parameters the PKey ? I don't see why any group parameters need to be the same -- I'm probably missing something, but which parameters in particular did you have in mind? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow
For the successful join, ND sends to the node directly, for the failure join, ND sends to all hosts addr. So ND will work no matter whether the join OK or not, that's the patch does. But what if the full-member join fails on node A for node A's solicited node group, but then node B succeeds in joining that group as a send-only member (perhaps because some other nodes have dropped off the fabric in the meantime). Then node B will send the ND message on a MCG that A is not a member of. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SA multicast patches
OK, another question about the multicast.c code: +static struct mcast_group *mcast_find(struct mcast_port *port, + union ib_gid *mgid) +{ +struct rb_node *node = port-table.rb_node; +struct mcast_group *group; +int ret; + +while (node) { +group = rb_entry(node, struct mcast_group, node); +ret = memcmp(mgid-raw, group-rec.mgid.raw, sizeof *mgid); +if (!ret) +return group; + +if (ret 0) +node = node-rb_left; +else +node = node-rb_right; +} +return NULL; +} + +static struct mcast_group *mcast_insert(struct mcast_port *port, +struct mcast_group *group, +int allow_duplicates) +{ +struct rb_node **link = port-table.rb_node; +struct rb_node *parent = NULL; +struct mcast_group *cur_group; +int ret; + +while (*link) { +parent = *link; +cur_group = rb_entry(parent, struct mcast_group, node); + +ret = memcmp(group-rec.mgid.raw, cur_group-rec.mgid.raw, + sizeof group-rec.mgid); +if (ret 0) +link = (*link)-rb_left; +else if (ret 0) +link = (*link)-rb_right; +else if (allow_duplicates) +link = (*link)-rb_left; +else +return cur_group; +} +rb_link_node(group-node, parent, link); +rb_insert_color(group-node, port-table); +return NULL; +} How does it work to put duplicates into the RB tree? It seems especially strange that the lookup code does: +if (ret 0) +node = node-rb_left; +else +node = node-rb_right; so if ret == 0 (ie the two GIDs being tested are the same) then it continues to traverse to the right, while the insert code does: +else if (allow_duplicates) +link = (*link)-rb_left; which seems to put duplicates to the left always. Also I'd be really worried that the rebalancing code freaks out when duplicate keys are inserted in the tree. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow
For starters, I think that rate, MTU, and SL (and maybe PKey too) need to be the same. There may be others too if I stare at the spec for a while... Can you expand on why? For example I definitely can send to the same MLID with different SLs. Of course MTU and rate need to match up but I don't see that as a real restriction -- the SM needs to allows for least-common-denominator values anyway, since the least-capable node on the fabric might join an existing group. I don't see why one MCG with an MTU of 2048 and one MCG with an MTU of 1024 can't share the same MLID, as long as the underlying fabric is capable of supporting an MTU of 2048. Actually, I wonder what the spec says about what switches should do if they're asked to forward packets with too-big MTUs? Maybe it all works out anyway. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow
But what if the full-member join fails on node A for node A's solicited node group, but then node B succeeds in joining that group as a send-only member (perhaps because some other nodes have dropped off the fabric in the meantime). Then node B will send the ND message on a MCG that A is not a member of. Yes. B can send ND to A, and A responds without being a member so IPv6 ND works. Is there any security or other problems here? Node A is not a member of the group B is sending on, so SM does not have to set up any routes for the messages to even reach node A. So it doesn't see the messages and doesn't respond to ND. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow
Sure but I think this complicates the SL2VL tables in the subnet to accomodate this. I think a similar thing is true for PKeys. So to me this is an SM complexity issue when mapping multiple MGRPs to same MLID. I'm still confused. Aren't SL2VL and P_Key tables completely orthogonal from forwarding tables? Obviously there's no problem using multiple different SLs or P_Keys to reach the same endport using the same LID, so I don't understand why MLIDs would be different. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SA multicast patches
All multicast groups need to be tracked, which is why even groups with MGID 0 are inserted into the tree. OK... Immediately above this code, the group is returned if ret == 0. Right, I missed that. But... Calling mcast_find() for MGID 0 isn't useful, so the code avoids doing this, but I think that it would work. The caller would just get an arbitrary group. Now this is confusing -- you say the code avoids looking up MGID 0 in the rbtree. So why do you have to insert those groups in the tree and have the allow_duplicates() flag etc? If you're never going to look up the group, I assume you have some other way of finding it and so you don't actually have to insert MGID 0 groups after all... right? Or is it that you want to be able to iterate through the whole rbtree and get the MGID 0 groups too? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IPv6oIB neighbour discover broken when MCGs overflow
Two MCGs groups must be establised before IPoIB link up, one is broadcast for IPv4, one is all hosts multicast for IPv6. So Node A is a member of all hosts address, the patch directs ND sends to all hosts, so node A responses it. I'm still confused. How do you interoperate with other RFC-compliant nodes (they might not have your patch or might not even be running Linux) that send ND messages to the solicited node group? If node A has your patch and doesn't try to join its own solicited node group, then another node that doesn't know to send ND messages to the all nodes group will not be able to find it. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support
OK, I pulled this in to my for-2.6.21 branch and I will ask Linus to pull later today. Thanks. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/core: Set static rate in ib_init_ah_from_path()
Guys, any reason not to merge this? It's step one of the cleanups from Jason's patch to make IPoIB work with global routes... The static rate from the path record should be put into the address vector -- a long time ago the rate in the address attributes needed to be a relative rate, which required more munging, but now that the conversion from absolute to relative is done in the low-level driver, it's easy for ib_init_ah_from_path() to put the absolute rate in. Cc: Jason Gunthorpe [EMAIL PROTECTED] Cc: Sean Hefty [EMAIL PROTECTED] Signed-off-by: Roland Dreier [EMAIL PROTECTED] --- drivers/infiniband/core/sa_query.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index d7d4a53..68db633 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -471,6 +471,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, ah_attr-sl = rec-sl; ah_attr-src_path_bits = be16_to_cpu(rec-slid) 0x7f; ah_attr-port_num = port_num; + ah_attr-static_rate = rec-rate; if (rec-hop_limit 1) { ah_attr-ah_flags = IB_AH_GRH; -- 1.4.4.4 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [GIT PULL] please pull infiniband.git
Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This adds IB multicast tracking, to allow userspace to use multicast groups in a sane way, an ehca interrupt handling fixup, and a few other minor things. I don't think there is anything major left, so we should be good for 2.6.21-rc1 after this pull. Dotan Barak (1): IB/mthca: Allow the QP state transition RESET-RESET Hoang-Nam Nguyen (4): IB/ehca: Rework irq handler IB/ehca: Fix race condition/locking issues in scaling code IB/ehca: Allow en/disabling scaling code via module parameter IB/ehca: Change query_port() to return LINK_UP instead UNKNOWN Michael S. Tsirkin (1): IPoIB: CM error handling thinko fix Roland Dreier (5): IB/mthca: Fix allocation of ICM chunks in coherent memory IPoIB: Only allow root to change between datagram and connected mode IB/core: Fix sparse warnings about shadowed declarations IB/ipath: Make ipath_map_sg() static IB/core: Set static rate in ib_init_ah_from_path() Sean Hefty (2): IB/sa: Track multicast join/leave requests RDMA/cma: Add multicast communication support Steve Wise (3): RDMA/iwcm: iw_cm_id destruction race fixes RDMA/cxgb3: Fail posts synchronously when in TERMINATE state RDMA/cxgb3: Remove Open Grid Computing copyrights in iw_cxgb3 driver drivers/infiniband/core/Makefile |2 +- drivers/infiniband/core/cma.c | 359 +-- drivers/infiniband/core/fmr_pool.c |4 +- drivers/infiniband/core/iwcm.c | 47 +- drivers/infiniband/core/multicast.c| 837 drivers/infiniband/core/sa.h | 66 ++ drivers/infiniband/core/sa_query.c | 30 +- drivers/infiniband/core/sysfs.c|2 - drivers/infiniband/core/ucma.c | 204 ++- drivers/infiniband/hw/cxgb3/cxio_dbg.c |1 - drivers/infiniband/hw/cxgb3/cxio_hal.c |1 - drivers/infiniband/hw/cxgb3/cxio_hal.h |1 - drivers/infiniband/hw/cxgb3/cxio_resource.c|1 - drivers/infiniband/hw/cxgb3/cxio_resource.h|1 - drivers/infiniband/hw/cxgb3/cxio_wr.h |1 - drivers/infiniband/hw/cxgb3/iwch.c |1 - drivers/infiniband/hw/cxgb3/iwch.h |1 - drivers/infiniband/hw/cxgb3/iwch_cm.c |1 - drivers/infiniband/hw/cxgb3/iwch_cm.h |1 - drivers/infiniband/hw/cxgb3/iwch_cq.c |1 - drivers/infiniband/hw/cxgb3/iwch_ev.c |1 - drivers/infiniband/hw/cxgb3/iwch_mem.c |1 - drivers/infiniband/hw/cxgb3/iwch_provider.c|1 - drivers/infiniband/hw/cxgb3/iwch_provider.h|1 - drivers/infiniband/hw/cxgb3/iwch_qp.c |3 +- drivers/infiniband/hw/cxgb3/iwch_user.h|1 - drivers/infiniband/hw/ehca/Kconfig |8 - drivers/infiniband/hw/ehca/ehca_classes.h | 19 +- drivers/infiniband/hw/ehca/ehca_eq.c |1 + drivers/infiniband/hw/ehca/ehca_hca.c |3 + drivers/infiniband/hw/ehca/ehca_irq.c | 307 + drivers/infiniband/hw/ehca/ehca_irq.h |1 + drivers/infiniband/hw/ehca/ehca_main.c | 32 +- drivers/infiniband/hw/ehca/ipz_pt_fn.h | 11 +- drivers/infiniband/hw/ipath/ipath_dma.c|4 +- drivers/infiniband/hw/mthca/mthca_memfree.c|4 +- drivers/infiniband/hw/mthca/mthca_qp.c |5 + drivers/infiniband/ulp/ipoib/ipoib_cm.c|4 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 195 ++ include/rdma/ib_addr.h |6 + include/rdma/ib_sa.h | 159 ++--- include/rdma/rdma_cm.h | 21 +- include/rdma/rdma_cm_ib.h |4 +- include/rdma/rdma_user_cm.h| 13 +- 44 files changed, 1889 insertions(+), 478 deletions(-) create mode 100644 drivers/infiniband/core/multicast.c create mode 100644 drivers/infiniband/core/sa.h ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] 32-bit build for ppc64 is required
Usually this should work, but I don't rely on that since we also support s390/s390x (although not with Infiniband, but the OpenMPI alternative that we shipped with RHEL4, lam, gets compiled on s390/s390x) and that pair is a bit of an odd mix and I don't have one setting here at my house where I work, so it's hard for me to confirm that just leaving things to happen by default works as anticipated. If they would ever make an s390 that uses less than a gigawatt of power and heats less than a large sized convention center, that could change... ;-) http://www.conmicro.cx/hercules/ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 2.6.21-rc1 5/5] ehca: query_port() returns LINK_UP instead UNKNOWN
Thanks, queued 1, 2, 3 and 5 for 2.6.21. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] How heavy to resize a CQ ?
In dynamic process application, we don't know how many connections a process will make when we create the CQ, so we don't know the CQ size, what we do is to increase the CQ size when a new connection is made, and decrease the CQ size when a connection is destroyed. My question is, is ibv_resize_cq() a lightweight function call ? Do we have to drain the CQ before we resize the CQ ? I would say that resizing a CQ is not lightweight -- I've never benchmarked it but it's probably comparable to creating a CQ or something like that. There is no requirement to drain the CQ or anything like that before resizing it -- you can resize it any time, even if it is currently getting completions or being polled. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion()
Looking at this one more time, I think it actually may be buggy: @@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d spin_lock_init(my_cq-spinlock); spin_lock_init(my_cq-cb_lock); spin_lock_init(my_cq-task_lock); +init_completion(my_cq-zero_callbacks); So you initialize the zero_callbacks completion once, at ehca_create_cq(). But then @@ -612,11 +613,14 @@ static void run_comp_task(struct ehca_cp spin_lock(cq-task_lock); cq-nr_callbacks--; -if (cq-nr_callbacks == 0) { +is_complete = (cq-nr_callbacks == 0); +if (is_complete) { list_del_init(cct-cq_list.next); cct-cq_jobs--; } spin_unlock(cq-task_lock); +if (is_complete) /* wake up waiting destroy_cq() */ +complete(cq-zero_callbacks); } every time nr_callbacks drops to 0, you complete the zero_callbacks completion. So the first time a callback runs, you will complete zero_callbacks, which will let wait_for_completion() finish even if you later increment nr_callbacks again. Also this -while (my_cq-nr_callbacks) { +if (my_cq-nr_callbacks) { spin_unlock_irqrestore(ehca_cq_idr_lock, flags); -yield(); +wait_for_completion(my_cq-zero_callbacks); spin_lock_irqsave(ehca_cq_idr_lock, flags); } looks rather unsafe -- I don't see any common locking protecting both this test of nr_callbacks and the setting of nr_callbacks in the ehca irq handling... so I don't see anything protecting you from seeing nr_callbacks==0 and not going into the if() (or while() -- the old code has the same problem I think) but then doing ++nr_callbacks somewhere else. In fact since you do the idr_remove() and hipz_h_destroy_cq() *after* you make sure no callbacks are running, this seems like it could happen easily. So I'm holding off on applying this for now. Please think it over and either tell me the current patch is OK, or fix it up. There's not really too much urgency because a change like this is something I would be comfortable merging between 2.6.21-rc1 and -rc2. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] remap_page_range() in older kernels
Do you remember any issues with using remap_page_range() in older kernels for mapping memory allocated in the kernel back to a user process? No, I would have thought it should work just like remap_pfn_range() in later kernels. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SA multicast patches
So I'm reading this over, and the following code looks kind of odd to me: +int ib_sa_get_mcmember_rec(struct ib_device *device, u8 port_num, + union ib_gid *mgid, struct ib_sa_mcmember_rec *rec) ... +} else { +memset(rec, 0, sizeof *rec); +ib_get_cached_gid(device, port_num, 0, rec-port_gid); +rec-pkey = 0x; +get_random_bytes(rec-qkey, sizeof rec-qkey); +rec-join_state = 1; +} Where is this particular hard-coded P_Key value coming from? And how about the Q_Key -- why is a random one being chosen? Does it matter that this is setting the privileged bit of the Q_Key at random? The only place this code seems to be used is in cma_join_ib_multicast(), which overwrites all the values that get set here anyway. (Except it leaves the Q_Key if the portspace is not UDP??) Would it be more sensible to leave the P_Key and Q_Key initialized to 0 here, and let the caller handle it? I don't see how the multicast tracking module can pick a sensible default here. Also, should we check the return value of ib_get_cached_gid()? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.21 iwcm - iw_cm_id destruction race condition fixes.
thanks, applied ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.21 iw_cxgb3 Fail posts synchronously when in TERMINATE state.
thanks, applied. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] iw_cxgb3 Fix copyrights in the iw_cxgb3 driver.
thanks, applied ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
How do you mean, again? Does sg_set_buf set dma_length? No, you're right, sorry. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
I don't see anything that ever bumps chunk-nsg if we're allocating a coherent region and we end up needing more than one allocation to do it. Yes but this is intentional. No, I think the code is fine and this patch will break things: chunk-nsg is needed only for non-coherent memory to call pci_unmap_sg: what about this code in mthca_memfree.h? static inline void mthca_icm_next(struct mthca_icm_iter *iter) { if (++iter-page_idx = iter-chunk-nsg) { the call to pci_unmap_sg you're worried about is in mthca_free_icm_pages(), which can't be called for coherent memory anyway, so I don't see a problem with that. So I think my patch is correct and needed. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events
Looks fine but this patch at least has serious whitespace damage... please resend a fixed version. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IPoIB: Only allow root to change between datagram and connected mode
Change the permissions of the mode sysfs attribute to be S_IWUSR instead of S_IWUGO. Signed-off-by: Roland Dreier [EMAIL PROTECTED] --- FYI -- I'm planning to merge this for 2.6.21. It doesn't seem appropriate to allow ordinary users to mess with this sort of config. drivers/infiniband/ulp/ipoib/ipoib_cm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 2d48387..8881a71 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -1138,7 +1138,7 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr, return -EINVAL; } -static DEVICE_ATTR(mode, S_IWUGO | S_IRUGO, show_mode, set_mode); +static DEVICE_ATTR(mode, S_IWUSR | S_IRUGO, show_mode, set_mode); int ipoib_cm_add_mode_attr(struct net_device *dev) { -- 1.4.4.4 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion()
I agree with Christoph -- the use of wait_for_completion() in a loop makes no sense. When you send a new copy of this patch without whitespace damage, please fix that up too... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] mvapich2 ofed 1.2 problem
Does this stack indicate that libibverbs is accessing a 1.0 provider? cxgb3 shouldn't be 1.0 right? #1 0x2b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830) at src/compat-1_0.c:572 #2 0x2b832cfef04e in rdma_cm_init_pd_cq () from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so This means that the app (or maybe the RDMA CM library?) is linked against the 1.0 API -- which should work even with cxgb3 actually. But maybe mvapich is built against the 1.1 API and the RDMA CM is built against 1.0 or something? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] mvapich2 ofed 1.2 problem
How do I tell? Can I tell from the .so files? ldd on the .so and the app would probably give you good info. I'm pretty sure that mpicc must be linking against an libibverbs 1.0 from somewhere. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] mvapich2 ofed 1.2 problem
When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is built, at least by looking at the .so file result: [EMAIL PROTECTED] ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a libibverbs.so libibverbs.so.1 libibverbs.so.1.0.0 The soname hasn't changed because the library is still compatible. But (I hope at least) OFED has libibverbs 1.1. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [GIT PULL] please pull infiniband.git
What about the patch that i sent on Allow the following QP state transition : reset -- reset? OK, I'll merge that in the next patch. It's the kind of patch I'm not happy about merging, since it bloats the code to handle a corner case no one is likely to hit in practice, but it is technically correct so I guess we're forced to merge it. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 4 of 4] IB/mthca: give reserved MTTs a separate cache line
Thanks, applied as 2 separate patches. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace.
Looks mostly sane (assuming it works on 32-bit userspace on 64-bit kernel now), but: -context = kmalloc(sizeof(*context), GFP_KERNEL); +context = kzalloc(sizeof(*context), GFP_KERNEL); Why do you need this? Is this an unrelated change? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace.
Because the key generator u32 is in the context now, and the kzalloc() initializes it. I could have done: context-key = 0; But km - kz was less typing. ;-) OK, got it. Anyway as I said, from a quick read the changes look sane, with the assumption that they work. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace.
Steve I tested and it works. Do you want to pull this in before Steve you push the driver upstream? Do I need to repost it? I'll grab it and merge it in. I expect to ask Linus to pull later today. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the physical address for mapping memory into userspace.
Actually, that patch doesn't apply because of the %llx warning fixes I pushed out. And git-apply also complains about trailing whitespace. Can you resend a version that applies to the my for-2.6.21 branch? Thanks ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
+sg_set_buf(mem, buf, PAGE_SIZE order); +BUG_ON(mem-offset); +sg_dma_len(mem) = PAGE_SIZE order; What am I missing? Any reason to set sg_dma_len() again after sg_set_buf()? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
Queued for 2.6.21, although I think a further cleanup would be: mdev-mr_table.mpt_table = mthca_alloc_icm_table(mdev, init_hca-mpt_base, dev_lim-mpt_entry_sz, mdev-limits.num_mpts, - mdev-limits.reserved_mrws, 1); + mdev-limits.reserved_mrws, + 1, 1); instead of having use_lowmem and use_coherent be separate parameters, we should probably convert it to a type parameter, and have MTHCA_ICM_TABLE_HIGHMEM, _LOWMEM and _COHERENT. That would make these calls a lot easier to read and get correct. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [GIT PULL] please pull infiniband.git
Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will add the new cxgb3 RDMA driver for Chelsio T3 NICs, as well as IPoIB connected mode and various other smaller changes: Ahmed S. Darwish (1): IB/core: Use ARRAY_SIZE macro for mandatory_table Akinobu Mita (1): IB/ehca: Fix memleak on module unloading David Howells (1): IB/mthca: Work around gcc bug on sparc64 Michael S. Tsirkin (6): IPoIB: Connected mode experimental support IB/mthca: Fix reserved MTTs calculation on mem-free HCAs IB/mthca: Give reserved MTTs a separate cache line IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs IB/mthca: Merge MR and FMR space on 64-bit systems IB/mthca: Always fill MTTs from CPU Roland Dreier (1): IB/mthca: Use correct structure size in call to memset() Sean Hefty (2): RDMA/cma: Increment port number after close to avoid re-use IB: Remove redundant _wq from workqueue names Steve Wise (1): RDMA/cxgb3: Add driver for Chelsio T3 RNIC drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile|1 + drivers/infiniband/core/addr.c |2 +- drivers/infiniband/core/cma.c | 68 +- drivers/infiniband/core/device.c |3 +- drivers/infiniband/hw/cxgb3/Kconfig| 27 + drivers/infiniband/hw/cxgb3/Makefile | 12 + drivers/infiniband/hw/cxgb3/cxio_dbg.c | 207 +++ drivers/infiniband/hw/cxgb3/cxio_hal.c | 1280 +++ drivers/infiniband/hw/cxgb3/cxio_hal.h | 201 +++ drivers/infiniband/hw/cxgb3/cxio_resource.c| 331 drivers/infiniband/hw/cxgb3/cxio_resource.h| 70 + drivers/infiniband/hw/cxgb3/cxio_wr.h | 685 drivers/infiniband/hw/cxgb3/iwch.c | 189 +++ drivers/infiniband/hw/cxgb3/iwch.h | 177 ++ drivers/infiniband/hw/cxgb3/iwch_cm.c | 2081 drivers/infiniband/hw/cxgb3/iwch_cm.h | 223 +++ drivers/infiniband/hw/cxgb3/iwch_cq.c | 225 +++ drivers/infiniband/hw/cxgb3/iwch_ev.c | 231 +++ drivers/infiniband/hw/cxgb3/iwch_mem.c | 172 ++ drivers/infiniband/hw/cxgb3/iwch_provider.c| 1203 ++ drivers/infiniband/hw/cxgb3/iwch_provider.h| 367 + drivers/infiniband/hw/cxgb3/iwch_qp.c | 1007 drivers/infiniband/hw/cxgb3/iwch_user.h| 67 + drivers/infiniband/hw/cxgb3/tcb.h | 632 +++ drivers/infiniband/hw/ehca/ehca_irq.c |2 + drivers/infiniband/hw/mthca/mthca_cmd.c|6 +- drivers/infiniband/hw/mthca/mthca_dev.h|2 + drivers/infiniband/hw/mthca/mthca_main.c | 40 +- drivers/infiniband/hw/mthca/mthca_memfree.c| 127 ++- drivers/infiniband/hw/mthca/mthca_memfree.h|9 +- drivers/infiniband/hw/mthca/mthca_mr.c | 110 ++- drivers/infiniband/hw/mthca/mthca_profile.c|2 +- drivers/infiniband/hw/mthca/mthca_provider.c | 14 +- drivers/infiniband/hw/mthca/mthca_provider.h |1 + drivers/infiniband/hw/mthca/mthca_qp.c |2 +- drivers/infiniband/hw/mthca/mthca_srq.c|9 +- drivers/infiniband/ulp/ipoib/Kconfig | 16 +- drivers/infiniband/ulp/ipoib/Makefile |1 + drivers/infiniband/ulp/ipoib/ipoib.h | 215 +++ drivers/infiniband/ulp/ipoib/ipoib_cm.c| 1237 ++ drivers/infiniband/ulp/ipoib/ipoib_ib.c| 29 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 63 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |4 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 40 +- drivers/infiniband/ulp/ipoib/ipoib_vlan.c |2 + 46 files changed, 11279 insertions(+), 114 deletions(-) create mode 100644 drivers/infiniband/hw/cxgb3/Kconfig create mode 100644 drivers/infiniband/hw/cxgb3/Makefile create mode 100644 drivers/infiniband/hw/cxgb3/cxio_dbg.c create mode 100644 drivers/infiniband/hw/cxgb3/cxio_hal.c create mode 100644 drivers/infiniband/hw/cxgb3/cxio_hal.h create mode 100644 drivers/infiniband/hw/cxgb3/cxio_resource.c create mode 100644 drivers/infiniband/hw/cxgb3/cxio_resource.h create mode 100644 drivers/infiniband/hw/cxgb3/cxio_wr.h create mode 100644 drivers/infiniband/hw/cxgb3/iwch.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch.h create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cm.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cm.h create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cq.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_ev.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_mem.c create mode 100644 drivers/infiniband/hw/cxgb3/iwch_provider.c create
Re: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
OK, I already merged this but now I'm thinking it's somewhat buggy: +if (coherent) +ret = mthca_alloc_icm_coherent(dev-pdev-dev, + chunk-mem[chunk-npages], + cur_order, gfp_mask); +else +ret = mthca_alloc_icm_pages(chunk-mem[chunk-npages], +cur_order, gfp_mask); -if (++chunk-npages == MTHCA_ICM_CHUNK_LEN) { +if (!ret) { +++chunk-npages; + +if (!coherent chunk-npages == MTHCA_ICM_CHUNK_LEN) { chunk-nsg = pci_map_sg(dev-pdev, chunk-mem, I don't see anything that ever bumps chunk-nsg if we're allocating a coherent region and we end up needing more than one allocation to do it. Maybe something like this on top of the patch? diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c index 0b9d053..48f7c65 100644 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -175,7 +175,9 @@ struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages, if (!ret) { ++chunk-npages; - if (!coherent chunk-npages == MTHCA_ICM_CHUNK_LEN) { + if (coherent) + ++chunk-nsg; + else if (chunk-npages == MTHCA_ICM_CHUNK_LEN) { chunk-nsg = pci_map_sg(dev-pdev, chunk-mem, chunk-npages, PCI_DMA_BIDIRECTIONAL); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] iw_cxgb3 Change cxio semaphore to mutex.
Thanks, applied along with the following warning cleanup for archs where u64 is unsigned long instead unsigned long long: diff --git a/drivers/infiniband/hw/cxgb3/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/cxio_dbg.c index dfaa704..5a7306f 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_dbg.c +++ b/drivers/infiniband/hw/cxgb3/cxio_dbg.c @@ -62,7 +62,7 @@ void cxio_dump_tpt(struct cxio_rdev *rdev, u32 stag) data = (u64 *)m-buf; while (size 0) { - PDBG(TPT %08x: %016llx\n, m-addr, (u64)*data); + PDBG(TPT %08x: %016llx\n, m-addr, (unsigned long long) *data); size -= 8; data++; m-addr += 8; @@ -100,7 +100,7 @@ void cxio_dump_pbl(struct cxio_rdev *rdev, u32 pbl_addr, uint len, u8 shift) data = (u64 *)m-buf; while (size 0) { - PDBG(PBL %08x: %016llx\n, m-addr, (u64)*data); + PDBG(PBL %08x: %016llx\n, m-addr, (unsigned long long) *data); size -= 8; data++; m-addr += 8; @@ -116,7 +116,8 @@ void cxio_dump_wqe(union t3_wr *wqe) if (size == 0) size = 8; while (size 0) { - PDBG(WQE %p: %016llx\n, data, be64_to_cpu(*data)); + PDBG(WQE %p: %016llx\n, data, +(unsigned long long) be64_to_cpu(*data)); size--; data++; } @@ -128,7 +129,8 @@ void cxio_dump_wce(struct t3_cqe *wce) int size = sizeof(*wce); while (size 0) { - PDBG(WCE %p: %016llx\n, data, be64_to_cpu(*data)); + PDBG(WCE %p: %016llx\n, data, +(unsigned long long) be64_to_cpu(*data)); size -= 8; data++; } @@ -159,7 +161,7 @@ void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents) data = (u64 *)m-buf; while (size 0) { - PDBG(RQT %08x: %016llx\n, m-addr, (u64)*data); + PDBG(RQT %08x: %016llx\n, m-addr, (unsigned long long) *data); size -= 8; data++; m-addr += 8; diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 19553b3..0531b94 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -298,7 +298,7 @@ int cxio_create_qp(struct cxio_rdev *rdev_p, u32 kernel_domain, wq-udb = (u64)rdev_p-rnic_info.udbell_physbase + (wq-qpid rdev_p-qpshift); PDBG(%s qpid 0x%x doorbell 0x%p udb 0x%llx\n, __FUNCTION__, -wq-qpid, wq-doorbell, wq-udb); +wq-qpid, wq-doorbell, (unsigned long long) wq-udb); return 0; err4: kfree(wq-sq); @@ -553,8 +553,8 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p) wqe-ctx1 = cpu_to_be64(ctx1); wqe-ctx0 = cpu_to_be64(ctx0); PDBG(CtrlQP dma_addr 0x%llx workq %p size %d\n, -(u64) rdev_p-ctrl_qp.dma_addr, rdev_p-ctrl_qp.workq, -1 T3_CTRL_QP_SIZE_LOG2); +(unsigned long long) rdev_p-ctrl_qp.dma_addr, +rdev_p-ctrl_qp.workq, 1 T3_CTRL_QP_SIZE_LOG2); skb-priority = CPL_PRIORITY_CONTROL; return (cxgb3_ofld_send(rdev_p-t3cdev_p, skb)); } diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c index 3d7c96f..98b3bdb 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cq.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c @@ -87,7 +87,7 @@ static int iwch_poll_cq_one(struct iwch_dev *rhp, struct iwch_cq *chp, lo 0x%x cookie 0x%llx\n, __FUNCTION__, CQE_QPID(cqe), CQE_TYPE(cqe), CQE_OPCODE(cqe), CQE_STATUS(cqe), CQE_WRID_HI(cqe), -CQE_WRID_LOW(cqe), cookie); +CQE_WRID_LOW(cqe), (unsigned long long) cookie); if (CQE_TYPE(cqe) == 0) { if (!CQE_STATUS(cqe)) diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c index 5909ec5..2b6cd53 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_mem.c +++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c @@ -163,7 +163,9 @@ int build_phys_page_list(struct ib_phys_buf *buffer_list, ((u64) j *shift)); PDBG(%s va 0x%llx mask 0x%llx shift %d len %lld pbl_size %d\n, -__FUNCTION__, *iova_start, mask, *shift, *total_size, *npages); +__FUNCTION__, (unsigned long long) *iova_start, +(unsigned long long) mask, *shift, (unsigned long long) *total_size, +*npages); return 0; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index d02cd72..549de0a 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -213,7 +213,7 @@ static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries,
Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support
IMO, probably worth it to init just this one field rather than use up initialized memory - and I think it's clearer. What do you mean by using up initialized memory? kzalloc() just does a memset(0), and it's not like there's a limit on the number of times we're allowed to call memset(). - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] for-2.6.21 Remove hw/cxgb3/core subdirectory.
Thanks, applied this and the previous patch, and pushed out my for-2.6.21 branch. I also rebased so the cxgb3 net driver builds now. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] integer overflow
while ((int) priv-tx_tail - (int) priv-tx_head 0) { seems to rely on integer overflow which seems to be undefined behaviour. tx_tail and tx_head are unsigned, and overflow is defined for unsigned integers. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] integer overflow
Yes but we cast them to signed int here - no? That's true, I guess it is technically undefined. But time_after() is relying on the same thing working, so I would say we don't care. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] more comments on cxgb3
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index db2b0a8..98568ee 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -99,6 +99,7 @@ static int iwch_dealloc_ucontext(struct struct iwch_dev *rhp = to_iwch_dev(context-device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); PDBG(%s context %p\n, __FUNCTION__, context); +free_mmaps(ucontext); cxio_release_ucontext(rhp-rdev, ucontext-uctx); kfree(ucontext); return 0; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 1ede8a7..c8c07ee 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -199,6 +199,21 @@ struct iwch_mm_entry { unsigned len; }; +static inline void free_mmaps(struct iwch_ucontext *ucontext) +{ +struct list_head *pos, *nxt; +struct iwch_mm_entry *mm; + +spin_lock(ucontext-mmap_lock); +list_for_each_safe(pos, nxt, ucontext-mmaps) { +mm = list_entry(pos, struct iwch_mm_entry, entry); +list_del(mm-entry); +kfree(mm); +} +spin_unlock(ucontext-mmap_lock); +return; +} Since you only have one caller, I would suggest just open-coding the deletion at the call-site (since that function is really too big to inline if it ever grows another caller). And I don't think you need the locking either, since there better be no one else looking at the context structure while you're in the process of freeing it. Something like: struct iwch_dev *rhp = to_iwch_dev(context-device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); struct iwch_mm_entry *mm, *tmp; PDBG(%s context %p\n, __FUNCTION__, context); list_for_each_entry_safe(mm, tmp, ucontext-mmaps) kfree(mm); cxio_release_ucontext(rhp-rdev, ucontext-uctx); kfree(ucontext); return 0; - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] please pull for 2.6.21: fix + add IB multicast support
I merged the increment port number and remove redundant '_wq' patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland I plan to review to multicast stuff next week and I hope to merge it for 2.6.21. Or, have you or anyone else at Voltaire read over the code in addition to using it? Do you see anything that should be cleaned up? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
OK, I've pulled the cxgb3 stuff into a single commit in my for-2.6.21 branch. I took the liberty of cleaning up some sparse warnings, etc. There's still a few other obvious things to fix up: drivers/infiniband/hw/cxgb3/iwch_ev.c:102:6: warning: symbol 'iwch_ev_disp atch' was not declared. Should it be static? Rather than putting an extern in iwch.c, please put a proper definition in an appropriate header file included from iwch.c. Also I agree with MST, I would like to see the core/ subdirectory die completely. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
Oh yeah -- Steve, please keep sending cleanup patches based on my tree now. I'm planning on asking Linus to merge what's in for-2.6.21 in the next couple of days, but there's still more than a week before the merge window closes, and even after the merge window closes I'll still accept fixes/cleanups for stuff already upstream. And here's what I have pending in for-2.6.21 so far: Ahmed S. Darwish (1): IB/core: Use ARRAY_SIZE macro for mandatory_table Akinobu Mita (1): IB/ehca: Fix memleak on module unloading David Howells (1): IB/mthca: Work around gcc bug on sparc64 Michael S. Tsirkin (1): IPoIB: Connected mode experimental support Roland Dreier (1): IB/mthca: Use correct structure size in call to memset() Sean Hefty (2): RDMA/cma: Increment port number after close to avoid re-use IB: Remove redundant _wq from workqueue names Steve Wise (1): RDMA/cxgb3: Add driver for Chelsio T3 Rnic ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()
BTW, while looking at iwcm.c, I noticed the following highly dubious code for the first time: static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) { int ret = 0; BUG_ON(atomic_read(cm_id_priv-refcount)==0); if (atomic_dec_and_test(cm_id_priv-refcount)) { BUG_ON(!list_empty(cm_id_priv-work_list)); if (waitqueue_active(cm_id_priv-destroy_comp.wait)) { BUG_ON(cm_id_priv-state != IW_CM_STATE_DESTROYING); BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, cm_id_priv-flags)); ret = 1; } complete(cm_id_priv-destroy_comp); } return ret; } The test of waitqueue_active on destroy_comp.wait looks really bad for two reasons: first, it is relying on an internal implementation detail of struct completion that really shouldn't be used by generic code. And second, it seems to me that this doesn't even work right, since there is a race something like the following: iw_destroy_cm_id(): destroy_cm_id(cm_id); // still 1 ref left cm_work_handler(): if (iwcm_deref_id()) // drop last ref return; // no one waiting yet, doesn't // return, but destroy_comp is // signaled wait_for_completion(cm_id_priv-destroy_comp); // destroy_comp is signaled, proceed kfree(cm_id_priv); // continue using cm_id_priv // OOPS I don't understand this code well enough for the fix to be obvious. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Problem with SRP with 512 byte sector size with 2 TB LUNs
Is it possible to add LUNs with 2 TB and 512 byte sectors ? Why does the READ CAPACITY(16) comand fail ? It seems that the DDN target is not reporting good information -- I don't see anything obviously wrong in what the kernel is doing (now that SRP sends a READ CAPACITY command). Do you know if the same type of config works over fibre channel? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support
Well, randomness is a resource after all, and since we don't have the additional security provided by PSNs in IPoIB UD, it seemed we do not need it for IPoIB CM either. So maybe the right thing is just to remove the FIXME comment. random32() doesn't use up any entropy. Random PSNs help avoid problems with stale connections, so I think we should do it. I noticed some funny code in ipoib_cm_skb_reap(): __be32 mtu = cpu_to_be32(priv-mcast_mtu); // htonl(__be32)?? icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); // no htonl() here -- is this correct? icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev); what is the right thing? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support
I noticed some funny code in ipoib_cm_skb_reap(): __be32 mtu = cpu_to_be32(priv-mcast_mtu); // htonl(__be32)?? icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); // no htonl() here -- is this correct? icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev); what is the right thing? Both are right I think. You're right -- the mistake is making mtu __be32 and preswapping it. I'll fix it up in my tree. These two functions seem to accept parameters in different format: include/net/icmp.h:extern void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info); include/linux/icmpv6.h:extern voidicmpv6_send(struct sk_buff *skb, include/linux/icmpv6.h- int type, int code, include/linux/icmpv6.h- __u32 info, include/linux/icmpv6.h- struct net_device *dev); BTW, I just looked at ip_gre.c and it has the same code. no, it leaves mtu as an int rather than swapping it. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets
I was going to resend it after Roland's earlier patch to clean up the ib_init_ah_from_path was accepted.. Sorry, I started having second thoughts about the part about changing it to return void (it seems more sensible to check it the other places it's called). But I'll look at that again soon. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Immediate data question
Changqing Does this pending SEND_WITH_IMM message affect the Changqing performance of the receiver process ? Is this message Changqing buffered in the receiver's HCA, or the sender retry and Changqing get RNR ack until receiver posts a receive ? If no receive is pending, then the responder sends an RNR NAK and the sender will wait for the RNR timeout and retry, etc. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets
I've started thinking about what it would take to get the rdma cm to work across a router. I think the rdma cm may need to treat IPv6 addresses as a GID for this to work across subnets, versus trying to map an ipoib IP address to a GID based on ARP. Hmm, why is that? Shouldn't IPoIB work through a router, and correctly get the GID of the final destination via ARP just fine? If the RDMA CM treats IPv6 addresses as GIDs, then this breaks things on a normal subnet with IPoIB interfaces configured with IPv6 addresses. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Immediate data question
Changqing What I mean is that, is there any performance penalty Changqing for receiver's overall performance if RNR happens Changqing continuously on one of the QP ? Not for the receiver, but the sender will be severely slowed down by having to wait for the RNR timeouts. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for unicast packets
Jason Basically, if IB routers are used, and the IPoIB feature of Jason *not* spanning a subnet is used (for scalabililty?) then Jason you need an alternate way to specify addresses to rdma cm. You mean if the IB router is also an IP router for IPoIB? Then I think there are some serious semantic problems to solve for the RDMA CM -- because you are using an IP address to define a destination, but since that address is on the other side of an IP router, there's no way to know it even belongs to an IB port. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] sharing qp between user and kernel
Pete Before I dig into this anymore, do you expect this to work? Pete Are there fundamental problems with QP sharing between user Pete and kernel? It would sure be nice not to have to stick the Pete connection management aspects into the kernel. No, I wouldn't expect this to work. At first glance at least, yes, there are fundamental problems. Sharing a QP between user and kernelspace, where userspace is doing full kernel bypass (as eg mthca does -- there are NO system calls when doing post work request, poll CQ and request CQ notification operations), seems like a huge problem. I don't see any way that the kernel can keep a consistent view of the QP state unless userspace has to call into the kernel for every operation, which would kill performance. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general