Re: [PATCH] librdmacm: Do not modify qp_init_attr in rdma_get_request

2010-10-19 Thread Jonathan Rosser
Hefty, Sean sean.he...@... writes:

 I added a while(1) loop to rdma_server to allow clients to connected
 repeatedly, and this worked for me.  Jonathan, can you see if this
 works for your testing as well?  If so, I'll commit.

Yesterday I tried setting attr-send/recv_cq = NULL in rdma_get_request() which 
fixes the bug in a somewhat ugly manner. Passing a copy of the attributes is a 
much tidier solution, and your patch works for me.

Many Thanks,
Jonathan.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ib mad definitions

2010-10-19 Thread Mike Heinz
Works for me.

-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Hefty, Sean
Sent: Monday, October 18, 2010 6:25 PM
To: linux-rdma@vger.kernel.org; Sasha Khapyorsky
Subject: ib mad definitions

This has probably been discussed before, but is there a strong reason why 
ib_types.h can't be moved from opensm/include/iba to 
libibumad/include/infiniband?

This appears to be the only place where IB MAD definitions are available for 
user space applications, and having them available at the libibumad level makes 
sense to me.

(I'm trying to port madeye to user space as a diag, and want all IB MAD 
definitions.)

- Sean 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] svcrdma: NFSRDMA Server fixes for 2.6.37

2010-10-19 Thread J. Bruce Fields
On Tue, Oct 12, 2010 at 03:33:46PM -0500, Tom Tucker wrote:
 Hi Bruce,
 
 These fixes are ready for 2.6.37. They fix two bugs in the server-side
 NFSRDMA transport.

Both applied and pushed out, thanks.

--b.

 
 Thanks,
 Tom
 ---
 
 Tom Tucker (2):
   svcrdma: Cleanup DMA unmapping in error paths.
   svcrdma: Change DMA mapping logic to avoid the page_address kernel API
 
 
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |   19 ---
  net/sunrpc/xprtrdma/svc_rdma_sendto.c|   82 
 ++
  net/sunrpc/xprtrdma/svc_rdma_transport.c |   41 +++
  3 files changed, 92 insertions(+), 50 deletions(-)
 
 -- 
 Signed-off-by: Tom Tucker t...@ogc.us
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Hal Rosenstock
On Mon, Oct 18, 2010 at 6:24 PM, Hefty, Sean sean.he...@intel.com wrote:
 This has probably been discussed before,

Yes, several times AFAIR.

 but is there a strong reason why ib_types.h can't be moved from 
 opensm/include/iba to libibumad/include/infiniband?

Why does this need to be moved ?

 This appears to be the only place where IB MAD definitions are available for 
 user space applications, and having them available at the libibumad level 
 makes sense to me.

 (I'm trying to port madeye to user space as a diag, and want all IB MAD 
 definitions.)

There already are diags including ib_types.h (saquery for one).

-- Hal

 - Sean
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2.6.36-rc7] infiniband: update workqueue usage

2010-10-19 Thread Tejun Heo
* ib_wq is added, which is used as the common workqueue for infiniband
  instead of the system workqueue.  All system workqueue usages
  including flush_scheduled_work() callers are converted to use and
  flush ib_wq.

* cancel_delayed_work() + flush_scheduled_work() converted to
  cancel_delayed_work_sync().

* qib_wq is removed and ib_wq is used instead.

This is to prepare for deprecation of flush_scheduled_work().

Signed-off-by: Tejun Heo t...@kernel.org
---
Hello,

I think this patch is safe but don't have any experience with or
access to infiniband stuff and it's only compile tested.  Also, while
looking through the code, I got curious about several things.

* Can any of the works in infiniband be used during memory reclaim?

* qib_cq_wq is a separate singlethread workqueue.  Does the queue
  require strict single thread execution ordering?  IOW, does each
  work have to be executed in the exact queued order and no two works
  should execute in parallel?  Or was the singlethreadedness chosen
  just to reduce the number of workers?

* The same question for ipoib_workqueue.

Thank you.

 drivers/infiniband/core/cache.c|4 +--
 drivers/infiniband/core/device.c   |   11 --
 drivers/infiniband/core/sa_query.c |4 +--
 drivers/infiniband/core/umem.c |2 -
 drivers/infiniband/hw/ipath/ipath_driver.c |2 -
 drivers/infiniband/hw/ipath/ipath_user_pages.c |2 -
 drivers/infiniband/hw/qib/qib_iba7220.c|7 ++
 drivers/infiniband/hw/qib/qib_iba7322.c|   14 ++---
 drivers/infiniband/hw/qib/qib_init.c   |   26 +++--
 drivers/infiniband/hw/qib/qib_qsfp.c   |9 +++-
 drivers/infiniband/hw/qib/qib_verbs.h  |5 +---
 drivers/infiniband/ulp/srp/ib_srp.c|4 +--
 include/rdma/ib_verbs.h|3 ++
 13 files changed, 41 insertions(+), 52 deletions(-)

Index: work/drivers/infiniband/core/cache.c
===
--- work.orig/drivers/infiniband/core/cache.c
+++ work/drivers/infiniband/core/cache.c
@@ -308,7 +308,7 @@ static void ib_cache_event(struct ib_eve
INIT_WORK(work-work, ib_cache_task);
work-device   = event-device;
work-port_num = event-element.port_num;
-   schedule_work(work-work);
+   queue_work(ib_wq, work-work);
}
}
 }
@@ -368,7 +368,7 @@ static void ib_cache_cleanup_one(struct
int p;

ib_unregister_event_handler(device-cache.event_handler);
-   flush_scheduled_work();
+   flush_workqueue(ib_wq);

for (p = 0; p = end_port(device) - start_port(device); ++p) {
kfree(device-cache.pkey_cache[p]);
Index: work/drivers/infiniband/core/device.c
===
--- work.orig/drivers/infiniband/core/device.c
+++ work/drivers/infiniband/core/device.c
@@ -38,7 +38,6 @@
 #include linux/slab.h
 #include linux/init.h
 #include linux/mutex.h
-#include linux/workqueue.h

 #include core_priv.h

@@ -52,6 +51,9 @@ struct ib_client_data {
void *data;
 };

+struct workqueue_struct *ib_wq;
+EXPORT_SYMBOL_GPL(ib_wq);
+
 static LIST_HEAD(device_list);
 static LIST_HEAD(client_list);

@@ -718,6 +720,10 @@ static int __init ib_core_init(void)
 {
int ret;

+   ib_wq = alloc_workqueue(infiniband, 0, 0);
+   if (!ib_wq)
+   return -ENOMEM;
+
ret = ib_sysfs_setup();
if (ret)
printk(KERN_WARNING Couldn't create InfiniBand device 
class\n);
@@ -726,6 +732,7 @@ static int __init ib_core_init(void)
if (ret) {
printk(KERN_WARNING Couldn't set up InfiniBand P_Key/GID 
cache\n);
ib_sysfs_cleanup();
+   destroy_workqueue(ib_wq);
}

return ret;
@@ -736,7 +743,7 @@ static void __exit ib_core_cleanup(void)
ib_cache_cleanup();
ib_sysfs_cleanup();
/* Make sure that any pending umem accounting work is done. */
-   flush_scheduled_work();
+   destroy_workqueue(ib_wq);
 }

 module_init(ib_core_init);
Index: work/drivers/infiniband/core/sa_query.c
===
--- work.orig/drivers/infiniband/core/sa_query.c
+++ work/drivers/infiniband/core/sa_query.c
@@ -422,7 +422,7 @@ static void ib_sa_event(struct ib_event_
port-sm_ah = NULL;
spin_unlock_irqrestore(port-ah_lock, flags);

-   schedule_work(sa_dev-port[event-element.port_num -
+   queue_work(ib_wq, sa_dev-port[event-element.port_num -
sa_dev-start_port].update_task);
}
 }
@@ -1068,7 +1068,7 @@ static void ib_sa_remove_one(struct ib_d


RE: ib mad definitions

2010-10-19 Thread Hefty, Sean
  but is there a strong reason why ib_types.h can't be moved from
 opensm/include/iba to libibumad/include/infiniband?
 
 Why does this need to be moved ?

The dependency should be on libibumad, not opensm.  libibumad is pretty much 
useless without these definitions.  Why wouldn't you move them?

 There already are diags including ib_types.h (saquery for one).

Yes, but we're either stuck with everything that needs ib_types.h to be part of 
the management.git tree, or the app needs to depend on opensm.  Currently, 
ibacm duplicates definitions because they aren't available anywhere else.
N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

Re: [PATCH] SIW: Documentation (initial)

2010-10-19 Thread Bernard Metzler
Randy,

...back from vacation.
Many thanks! I'll take it all over.


Bernard.

Randy Dunlap randy.dun...@oracle.com wrote on 10/15/2010 12:57:03 AM:

snip

  +
  +User Interface
  +--
  +All fast path operations such as posting of work requests and
  +reaping of work completions currently involve a system call into
  +the siw module. Kernel/user-mapped send and receive as well as

 I didn't find the system call(s).  Are they new syscalls or just
 (socket) reads/writes?  (I was probably looking for new syscalls.)


I will have to clarify. Currently all operations are using the
infiniband/core infrastructure (e.g. via uverbs write file
operation). There is no private interface between libsiw and
siw kernel module in place.


snip

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Hal Rosenstock
On Tue, Oct 19, 2010 at 11:28 AM, Hefty, Sean sean.he...@intel.com wrote:
  but is there a strong reason why ib_types.h can't be moved from
 opensm/include/iba to libibumad/include/infiniband?

 Why does this need to be moved ?

 The dependency should be on libibumad, not opensm.  libibumad is pretty much 
 useless without these definitions.  Why wouldn't you move them?

Off the top of my head, OpenSM is layered on top of libibumad but
doesn't need/use libibmad. I think that was the main reason although
that could be changed if ib_types.h were to be moved. I'm not sure
what other reasons came up in the previous discussions.


 There already are diags including ib_types.h (saquery for one).

 Yes, but we're either stuck with everything that needs ib_types.h to be part 
 of the management.git tree, or the app needs to depend on opensm.  Currently, 
 ibacm duplicates definitions because they aren't available anywhere else.

I agree ib_types.h is more generic than opensm. Moving to libibmad and
making opensm depend on this is probably better than all the
duplication. There have been viewpoints that libibumad and libibmad
shouldn't be separate (as they are small) but they were never combined
into a single library.

-- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ib mad definitions

2010-10-19 Thread Hefty, Sean
 I agree ib_types.h is more generic than opensm. Moving to libibmad and
 making opensm depend on this is probably better than all the
 duplication. There have been viewpoints that libibumad and libibmad
 shouldn't be separate (as they are small) but they were never combined
 into a single library.

My motivation with these changes is for ibacm to receive and use notification 
of CM timeouts to update its path record cache.  ibacm already defines the 
basic mad structure, multicast record, and path record.  It would also need the 
CM mad format.  I'd happily remove these definitions if they were already 
available.

Porting madeye to user space is a side benefit to the proposed kernel changes.

ibacm only depends on libibumad.  The madeye port also only depends on 
libibumad.  Honestly, I find the libibmad APIs confusing.  I'd much rather 
libibumad provide mad definitions.

Sasha/Ira, do either of you have opinions on this?


Re: ib mad definitions

2010-10-19 Thread Hal Rosenstock
On Tue, Oct 19, 2010 at 12:48 PM, Hefty, Sean sean.he...@intel.com wrote:
 I agree ib_types.h is more generic than opensm. Moving to libibmad and
 making opensm depend on this is probably better than all the
 duplication. There have been viewpoints that libibumad and libibmad
 shouldn't be separate (as they are small) but they were never combined
 into a single library.

The other thing I just recalled was the OpenSM portability issue.
ib_types.h is needed here and libibmad/libibumad is not in all those
environments. As you''re all too well aware, this was even the case in
Windows until very recently. There may still be others we care about
where moving ib_types.h might be problematic.

-- Hal

 My motivation with these changes is for ibacm to receive and use notification 
 of CM timeouts to update its path record cache.  ibacm already defines the 
 basic mad structure, multicast record, and path record.  It would also need 
 the CM mad format.  I'd happily remove these definitions if they were already 
 available.

 Porting madeye to user space is a side benefit to the proposed kernel changes.

 ibacm only depends on libibumad.  The madeye port also only depends on 
 libibumad.  Honestly, I find the libibmad APIs confusing.  I'd much rather 
 libibumad provide mad definitions.

 Sasha/Ira, do either of you have opinions on this?

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.6.36-rc7] infiniband: update workqueue usage

2010-10-19 Thread Ralph Campbell
On Tue, 2010-10-19 at 08:24 -0700, Tejun Heo wrote:

 * qib_cq_wq is a separate singlethread workqueue.  Does the queue
   require strict single thread execution ordering?  IOW, does each
   work have to be executed in the exact queued order and no two works
   should execute in parallel?  Or was the singlethreadedness chosen
   just to reduce the number of workers?

The work functions need to be called in-order and single threaded
or memory will be freed multiple times and other bad things.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Ira Weiny
On Tue, 19 Oct 2010 08:43:22 -0700
Hal Rosenstock hal.rosenst...@gmail.com wrote:

 On Tue, Oct 19, 2010 at 11:28 AM, Hefty, Sean sean.he...@intel.com wrote:
   but is there a strong reason why ib_types.h can't be moved from
  opensm/include/iba to libibumad/include/infiniband?
 
  Why does this need to be moved ?
 
  The dependency should be on libibumad, not opensm.  libibumad is pretty 
  much useless without these definitions.  Why wouldn't you move them?
 
 Off the top of my head, OpenSM is layered on top of libibumad but
 doesn't need/use libibmad. I think that was the main reason although
 that could be changed if ib_types.h were to be moved. I'm not sure
 what other reasons came up in the previous discussions.

I think ib_types.h should be part of ibumad.  Everything depends on libibumad
at some point.[*]  Therefore common mad definitions should be in ib_types.h and
packaged with libibumad.

[*] ok OpenSM does not strictly, see below.

 
 
  There already are diags including ib_types.h (saquery for one).
 
  Yes, but we're either stuck with everything that needs ib_types.h to be 
  part of the management.git tree, or the app needs to depend on opensm.  
  Currently, ibacm duplicates definitions because they aren't available 
  anywhere else.
 
 I agree ib_types.h is more generic than opensm. Moving to libibmad and
 making opensm depend on this is probably better than all the
 duplication. There have been viewpoints that libibumad and libibmad
 shouldn't be separate (as they are small) but they were never combined
 into a single library.

The opposing view is that libibumad is only an interface to the kernel umad
module, where libibmad is more abstract.


As far as moving ib_types, I suggested this a while back.
http://www.mail-archive.com/gene...@lists.openfabrics.org/msg27439.html

Let's see if I can summarize the thread.

- Sean was workiong on libibacm and redefined ib_types.h definitions.
- I suggested moving ib_types.h to umad so he would not have a dependancy on
  OpenSM.
- Sean brought up that ib_types.h is large and probably should be split
- I agreed, and asked Sasha if such a patch would be acceptable, or create a
  new library to deal with the inline functions in ib_types.h
- Hal said that ibutils requires ib_types.h but does not want a dependancy on
  libibumad...
- I suggested a separate library to solve this problem.
- Hal corrected himself saying that ibutils requires osm_vendor_ibumad.
  However, OpenSM does not always use libibumad (depending on the underlying
  stack) so it would need to get ib_types somewhere else.  Hal was also
  concerned about a library with little more than a header file in it.
- Jason chimed in with Please no more libraries...  :-)  (and digressed with
  Sean in to PR queries, MPI, and other useful, but unrelated, stuff)
- Sean says libibumad is pretty useless without some network structure
  definitions.
- I state that it looks like ibutils dependancy is on the static functions in
  ib_types.h only.
- Hal says yes ibutils depends on OpenSM for the vendor layer and that
  Mellanox is better able to answer questions regarding ibutils support.
- Hal says he thinks ib_types is more akin to what is in libibmad rather than
  libibumad
- Sean finds that ib_types.h includes complib headers.
- I submit a rough hack to remove complib headers.
- Jason, Sean, and myself discuss ugly byteswapping functions.
- Sasha agrees that he is not sure that umad is the right place for ib_types
- Sean says we should split the file up and at least some of the definitions
  should be in umad...


We all get busy...


I think we need to move ib_types (mad definitions to umad).

Basic MAD definitions should be provided at the lowest possible level so all
software can use them.

The issues (solutions) are:

ib_types depends on complib at the moment (fixable)
ibutils depends on OpenSM (it will anyway -- non-issue)
somethings in ib_types are ugly, byteswapping (non-issue; deal with it later)
OpenSM may _not_ include umad and therefore miss defines. (fixable?)

As for this last item, would it be a big deal to require umad for the header
only?  Does umad not compile somewhere that other vendor layers are used?  I
think it is much better for OpenSM to require umad than for other MAD
processing software to require OpenSM.  Also, would splitting ib_types help
this at all?


Ira

 
 -- Hal
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://BLOCKEDvger.kernel.org/majordomo-info.html
 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.6.36-rc7] infiniband: update workqueue usage

2010-10-19 Thread Bart Van Assche
On Tue, Oct 19, 2010 at 5:24 PM, Tejun Heo t...@kernel.org wrote:
 [ ... ]
 This is to prepare for deprecation of flush_scheduled_work().
 [ ... ]
 Index: work/include/rdma/ib_verbs.h
 [ ... ]
 +extern struct workqueue_struct *ib_wq;
 [ ... ]

This patch adds a declaration of a global variable to a public header
file. That might be unavoidable, but it doesn't make me happy.

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ib mad definitions

2010-10-19 Thread Hefty, Sean
 ib_types depends on complib at the moment (fixable)
 ibutils depends on OpenSM (it will anyway -- non-issue)
 somethings in ib_types are ugly, byteswapping (non-issue; deal with it
 later)
 OpenSM may _not_ include umad and therefore miss defines. (fixable?)
 
 As for this last item, would it be a big deal to require umad for the
 header
 only?  Does umad not compile somewhere that other vendor layers are used?
 I
 think it is much better for OpenSM to require umad than for other MAD
 processing software to require OpenSM.  Also, would splitting ib_types help
 this at all?

I'll propose the following:

1. Add to libibumad/include/infiniband:

   umad_types.h - basic mad, rmpp headers
   umad_sa.h- SA attributes
   umad_cm.h- CM messages

2. Include umad_types.h and umad_sa.h from ib_types.h
3. Include umad_cm.h from ib_cm_types.h

We start with a minimal set of definitions to umad and add/move other 
definitions later as needed, creating new header files where appropriate 
(umad_smi.h, umad_pm.h, etc.)

If we can get some basic agreement on this, I'll start on the patches 
immediately.  In an ideal world, the new header files would work on any 
platform.

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Ira Weiny
On Tue, 19 Oct 2010 11:50:46 -0700
Hefty, Sean sean.he...@intel.com wrote:

  ib_types depends on complib at the moment (fixable)
  ibutils depends on OpenSM (it will anyway -- non-issue)
  somethings in ib_types are ugly, byteswapping (non-issue; deal with it
  later)
  OpenSM may _not_ include umad and therefore miss defines. (fixable?)
  
  As for this last item, would it be a big deal to require umad for the
  header
  only?  Does umad not compile somewhere that other vendor layers are used?
  I
  think it is much better for OpenSM to require umad than for other MAD
  processing software to require OpenSM.  Also, would splitting ib_types help
  this at all?
 
 I'll propose the following:
 
 1. Add to libibumad/include/infiniband:
 
umad_types.h - basic mad, rmpp headers
umad_sa.h- SA attributes
umad_cm.h- CM messages
 
 2. Include umad_types.h and umad_sa.h from ib_types.h
 3. Include umad_cm.h from ib_cm_types.h
 
 We start with a minimal set of definitions to umad and add/move other 
 definitions later as needed, creating new header files where appropriate 
 (umad_smi.h, umad_pm.h, etc.)
 
 If we can get some basic agreement on this, I'll start on the patches 
 immediately.  In an ideal world, the new header files would work on any 
 platform.

I agree,
Ira

 
 - Sean
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://BLOCKEDvger.kernel.org/majordomo-info.html
 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] Hang in dat_ia_open()

2010-10-19 Thread Davis, Arlin R
Thanks! Applied

-Original Message-
From: Pradeep Satyanarayana [mailto:prade...@linux.vnet.ibm.com]
Sent: Monday, October 18, 2010 1:23 PM
To: Davis, Arlin R
Cc: linux-rdma
Subject: [PATCH] Hang in dat_ia_open()

Hi Arlin,

During some error case testing we discovered a hang in dat_ia_open(). A 
colleague
wrote a test program that duplicates the issue.

Here is the trace of the hang:

# ./testUdaplDyn
coralxib40:6122:  open_hca: rdma_bind ERR Cannot assign requested address. Is
ib1 configured?

   Executable hangs here:


Stack:

(gdb) where
#0  0x2b5906a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x2b58e3ba in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#2  0x2b7bd82d in rdma_destroy_id () from /usr/lib64/librdmacm.so.1
#3  0x2b6b0144 in ?? () from /usr/lib64/libdaplofa.so.2
#4  0x2b6a7a03 in ?? () from /usr/lib64/libdaplofa.so.2
#5  0x2b3703fb in dat_ia_openv () from /usr/lib64/libdat2.so
#6  0x004009c6 in isDatDeviceValidDyn(char*) ()
#7  0x00400b87 in main ()
(gdb)


I checked (the code in) several versions of dapl-2.0 and this problem exists
in all of them including dapl-2.0.30. In this case I happened to use 
dapl-2.0.27.
The hang is caused due to the erroneous invocation of rdma_destroy_id() twice 
in a row.


--- Signed-off-by: Pradeep Satyanarayana prade...@linux.vnet.ibm.com$diff 
-Nup dapl-2.0.27/dapl/openib_cma/device.c.orig 
dapl-2.0.27/dapl/openib_cma/device.c
--- dapl-2.0.27/dapl/openib_cma/device.c.orig   2010-10-15 17:19:06.572503024 
-0400
+++ dapl-2.0.27/dapl/openib_cma/device.c2010-10-15 17:19:16.013082441 
-0400
@@ -358,7 +358,6 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_N
}
ret = rdma_bind_addr(cm_id, (struct sockaddr *)hca_ptr-hca_address);
if ((ret) || (cm_id-verbs == NULL)) {
-   rdma_destroy_id(cm_id);
dapl_log(DAPL_DBG_TYPE_ERR,
  open_hca: rdma_bind ERR %s.
  Is %s configured?\n, strerror(errno), hca_name);
$

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Jason Gunthorpe
On Tue, Oct 19, 2010 at 11:50:46AM -0700, Hefty, Sean wrote:

 We start with a minimal set of definitions to umad and add/move
 other definitions later as needed, creating new header files where
 appropriate (umad_smi.h, umad_pm.h, etc.)
 
 If we can get some basic agreement on this, I'll start on the
 patches immediately.  In an ideal world, the new header files would
 work on any platform.

Can we at least agree on the usage of these structures first? Are the
constants going to be in host or network byte order?

Are you going to make something like the kernel where there is a
native structure and pack/unpack function set?

Something macro-based like foo = GET_MEMBER(*pr,preference)

Network byte order casting structures?

Host byte order casting structures? (my favorite)

bitfields?

For years now I've had a set of data files that describe all the IB
structures bitfield layouts. I think I can contribute the data files
but not the generator script.

Since they all have various merits, maybe the smartest thing is to just
codegen all of the above permutations from single data source?

ie
// network endian bitfield casting structure
struct MADHeader_NE x = {};
x.status = htons(1);

// host endian bitfield casting structure
struct MADHeader_HE x = {};
x.status = 1
to_network(x,sizeof(x)); // x[i] = htonl(x[i]) for i in len/4

/* Non-bitfield macro access structure
   (using the 1 byte = 1 bit helper structure technique) */
struct MADHeader_M x = {}
SET_MEMBER(x,status,1);  

// Pack/unpack function structure
struct MADHeader_UP x = {};
x.status = htons(1);
pack_MADHeader(x,mad_buf,sizeof(mad_buf));

I'd like to think we don't need the last one, but people seem to like
that scheme ..

I also like to codegen structure printing functions, that is
surprisingly useful - and implements a good chunk of madeye.
 
What do you think?

I've also very recently been thinking that I'd like python bindings
for MADs for some projects. I was planning on building it out with the
code gen scheme.

Ira, I think the cleanest answer is that OSM keeps its type file, and
umad gets a new one that is cleaner, more capable and probably
incompatible. I'd hate to see us stick to the OSM scheme for umad just
for code compatability.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Opensm crash with OFED 1.5

2010-10-19 Thread Suresh Shelvapille

Just want to let you all know that OpenSM seems to work fine with Centos5.5 on 
the same HW.

Thanks,
Suri

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Suresh
 Shelvapille
 Sent: Wednesday, October 13, 2010 3:07 PM
 To: 'Linux RDMA list'; 'Tziporet Koren'
 Subject: RE: Opensm crash with OFED 1.5
 
 
 I tried 1.5.2 and that did not help, same kernel oops.
 
  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org 
  [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of
 Suresh
  Shelvapille
  Sent: Tuesday, October 12, 2010 7:22 PM
  To: 'Linux RDMA list'
  Subject: Opensm crash with OFED 1.5
 
 
  Folks:
 
  I have a multi-processor machine, running FedoraCore 12. I have installed 
  OFED 1.5. Everything
 seems
  to come up ok, I
  can look at the ibstat and it shows that the Mellanox card stats etc...
 
  As soon as I start opensm, I get the following kernel oops and the machine 
  locks up.
 
  Any ideas
 
  Thanks,
  Suri
 
  --
 
  Oct 12 17:19:38 localhost OpenSM[2617]: OpenSM 3.3.5#012
 
  Oct 12 17:19:38 localhost OpenSM[2617]: Entering DISCOVERING state#012
 
  Oct 12 17:20:20 localhost kernel: ib0: ib_query_gid() failed
 
  Oct 12 17:20:30 localhost kernel: ib0: ib_query_port failed
 
  Oct 12 17:20:52 localhost kernel: BUG: soft lockup - CPU#15 stuck for 61s! 
  [opensm:2637]
 
  Oct 12 17:20:52 localhost kernel: Modules linked in: fuse sunrpc 
  ip6t_REJECT nf_conntrack_ipv6
  ip6table_filter
  ip6_tables cpufreq_ondemand acpi_cpufreq freq_table rdma_ucm ib_sdp rdma_cm 
  iw_cm ib_addr ib_ipoib
  ib_cm ib_sa ipv6
  ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mlx4_en mlx4_ib ib_mthca 
  ib_mad ib_core
  dm_multipath uinput mlx4_core
  igb i2c_i801 joydev dca i2c_core iTCO_wdt iTCO_vendor_support mpt2sas 
  scsi_transport_sas [last
  unloaded: microcode]
 
  Oct 12 17:20:52 localhost kernel: CPU 15:
 
  Oct 12 17:20:52 localhost kernel: Modules linked in: fuse sunrpc 
  ip6t_REJECT nf_conntrack_ipv6
  ip6table_filter
  ip6_tables cpufreq_ondemand acpi_cpufreq freq_table rdma_ucm ib_sdp rdma_cm 
  iw_cm ib_addr ib_ipoib
  ib_cm ib_sa ipv6
  ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mlx4_en mlx4_ib ib_mthca 
  ib_mad ib_core
  dm_multipath uinput mlx4_core
  igb i2c_i801 joydev dca i2c_core iTCO_wdt iTCO_vendor_support mpt2sas 
  scsi_transport_sas [last
  unloaded: microcode]
 
  Oct 12 17:20:52 localhost kernel: Pid: 2637, comm: opensm Not tainted 
  2.6.31.5-127.fc12.x86_64 #1
  X8DTH-i/6/iF/6F
 
  Oct 12 17:20:52 localhost kernel: RIP: 0010:[81203558]  
  [81203558]
  __bitmap_empty+0x0/0x64
 
  Oct 12 17:20:52 localhost kernel: RSP: 0018:880c174bbd90  EFLAGS: 
  0246
 
  Oct 12 17:20:52 localhost kernel: RAX:  RBX: 
  880c174bbdd8 RCX: 0001
 
  Oct 12 17:20:52 localhost kernel: RDX: 818ba920 RSI: 
  0100 RDI: 818ba918
 
  Oct 12 17:20:52 localhost kernel: RBP: 8101286e R08: 
   R09: 0004
 
  Oct 12 17:20:52 localhost kernel: R10: 0004 R11: 
  0206 R12: 880c174bbdd8
 
  Oct 12 17:20:52 localhost kernel: R13: 8101286e R14: 
  810dc920 R15: 880c174bbcf8
 
  Oct 12 17:20:52 localhost kernel: FS:  7ff2d02e7710() 
  GS:c90001e0()
  knlGS:
 
  Oct 12 17:20:52 localhost kernel: CS:  0010 DS:  ES:  CR0: 
  80050033
 
  Oct 12 17:20:52 localhost kernel: CR2: 0041f0c0 CR3: 
  000c19074000 CR4: 06e0
 
  Oct 12 17:20:52 localhost kernel: DR0:  DR1: 
   DR2: 
 
  Oct 12 17:20:52 localhost kernel: DR3:  DR6: 
  0ff0 DR7: 0400
 
  Oct 12 17:20:52 localhost kernel: Call Trace:
 
  Oct 12 17:20:52 localhost kernel: [810383f2] ? 
  native_flush_tlb_others+0xc3/0xf2
 
  Oct 12 17:20:52 localhost kernel: [8103859d] ? 
  flush_tlb_mm+0x6f/0x76
 
  Oct 12 17:20:52 localhost kernel: [810debbc] ? 
  mprotect_fixup+0x480/0x611
 
  Oct 12 17:20:52 localhost kernel: [810da81d] ? 
  free_pgtables+0xa9/0xcc
 
  Oct 12 17:20:52 localhost kernel: [810f185d] ? 
  virt_to_head_page+0xe/0x2f
 
  Oct 12 17:20:52 localhost kernel: [810deee9] ? 
  sys_mprotect+0x19c/0x227
 
  Oct 12 17:20:52 localhost kernel: [81011cf2] ? 
  system_call_fastpath+0x16/0x1b
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-rdma in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  

RE: ib mad definitions

2010-10-19 Thread Smith, Stan
Ira Weiny wrote:
 On Tue, 19 Oct 2010 11:50:46 -0700
 Hefty, Sean sean.he...@intel.com wrote:

 ib_types depends on complib at the moment (fixable)
 ibutils depends on OpenSM (it will anyway -- non-issue)
 somethings in ib_types are ugly, byteswapping (non-issue; deal with
 it later) OpenSM may _not_ include umad and therefore miss defines.
 (fixable?)

 As for this last item, would it be a big deal to require umad for
 the header only?  Does umad not compile somewhere that other vendor
 layers are used? I think it is much better for OpenSM to require
 umad than for other MAD processing software to require OpenSM.
 Also, would splitting ib_types help this at all?

 I'll propose the following:

 1. Add to libibumad/include/infiniband:

umad_types.h - basic mad, rmpp headers
umad_sa.h- SA attributes
umad_cm.h- CM messages

 2. Include umad_types.h and umad_sa.h from ib_types.h
 3. Include umad_cm.h from ib_cm_types.h

 We start with a minimal set of definitions to umad and add/move
 other definitions later as needed, creating new header files where
 appropriate (umad_smi.h, umad_pm.h, etc.)

 If we can get some basic agreement on this, I'll start on the
 patches immediately.  In an ideal world, the new header files would
 work on any platform.

 I agree,
 Ira

Just to be painfully clear ...
A user-mode application would then only need to include ib_types.h + CM flavor 
of choice .h files ?



 - Sean
 --
 To unsubscribe from this list: send the line unsubscribe
 linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at
 http://BLOCKEDvger.kernel.org/majordomo-info.html



 --
 Ira Weiny
 Math Programmer/Computer Scientist
 Lawrence Livermore National Lab
 925-423-8008
 wei...@llnl.gov

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ib mad definitions

2010-10-19 Thread Hefty, Sean
 Just to be painfully clear ...
 A user-mode application would then only need to include ib_types.h + CM
 flavor of choice .h files ?

For compatibility, ib_types.h would include whatever files any definitions were 
moved to.  An application that includes ib_types.h today wouldn't need 
additional includes.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ib mad definitions

2010-10-19 Thread Hefty, Sean
 Can we at least agree on the usage of these structures first? Are the
 constants going to be in host or network byte order?

I was simply suggesting to 'move' some of the existing structures and defines.

 Are you going to make something like the kernel where there is a
 native structure and pack/unpack function set?

This would not be my preference.

 Something macro-based like foo = GET_MEMBER(*pr,preference)
 
 Network byte order casting structures?
 
 Host byte order casting structures? (my favorite)
 
 bitfields?

again - not my preference

 Ira, I think the cleanest answer is that OSM keeps its type file, and
 umad gets a new one that is cleaner, more capable and probably
 incompatible. I'd hate to see us stick to the OSM scheme for umad just
 for code compatability.

Whatever is done must fit within the windows development framework that we use.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Jason Gunthorpe
On Tue, Oct 19, 2010 at 06:00:51PM -0700, Hefty, Sean wrote:
  Can we at least agree on the usage of these structures first? Are the
  constants going to be in host or network byte order?
 
 I was simply suggesting to 'move' some of the existing structures and defines.

But they are horrible and little used outside opensm right now, you
really want to commit to that forever?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Ira Weiny
On Tue, 19 Oct 2010 18:00:51 -0700
Hefty, Sean sean.he...@intel.com wrote:

  Can we at least agree on the usage of these structures first? Are the
  constants going to be in host or network byte order?
 
 I was simply suggesting to 'move' some of the existing structures and defines.
 
  Are you going to make something like the kernel where there is a
  native structure and pack/unpack function set?
 
 This would not be my preference.
 
  Something macro-based like foo = GET_MEMBER(*pr,preference)
  
  Network byte order casting structures?
  
  Host byte order casting structures? (my favorite)
  
  bitfields?
 
 again - not my preference
 
  Ira, I think the cleanest answer is that OSM keeps its type file, and
  umad gets a new one that is cleaner, more capable and probably
  incompatible. I'd hate to see us stick to the OSM scheme for umad just
  for code compatability.
 
 Whatever is done must fit within the windows development framework that we 
 use.

I am all for cleaner, more capable... but why incompatible?  If we want to
start fresh and then convert OpenSM later, fine.  But _don't_ forget to go
back and convert OpenSM, because if you leave ib_types.h out there someone is
going to use it and we are back to where we started...  :-(  Same for ibmad,
when these definitions become available in umad, mad can be simplified.

What I would like right now is to get the definitions in 1 place!

Right now there are 3 headers I find path record in.

libibverbs: sa.h
libibmad: mad.h
opensm: ib_types.h


Node type is defined in:

libibverbs: verbs.h
opensm: ib_types.h
libibmad: mad.h

I could go on.

What Sean is offering to do is move ib_types to umad.  From there I can use
those definitions in mad (thus removing them from mad and consolidating at
least 2 of the 3 above).  Perhaps use them in ibverbs as well?  As a first
step I think we should take Sean up on his offer to start cleaning things up.
But we have to remove stuff as we go or we will just be defining yet another
place to look for these.  After this we can look at making things cleaner
(perhaps even combining mad and umad, and including some of the ideas you have
above).  As Sean said in another email, after this change; including
ib_types.h will be the same for anyone using it.  The exception is that we
have simplified the code.  I think this is a win-win with minimal work.

Ira

-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Ira Weiny
On Tue, 19 Oct 2010 18:09:58 -0700
Jason Gunthorpe jguntho...@obsidianresearch.com wrote:

 On Tue, Oct 19, 2010 at 06:00:51PM -0700, Hefty, Sean wrote:
   Can we at least agree on the usage of these structures first? Are the
   constants going to be in host or network byte order?
  
  I was simply suggesting to 'move' some of the existing structures and 
  defines.
 
 But they are horrible and little used outside opensm right now, you
 really want to commit to that forever?

Not everything is horrible.  And if it is we can fix it.  But I think
defining yet another header with the same functionality is worse.  Like it or
not ib_types is there.  If you don't remove/fix it, someone will find it and use
it.  How does that make things cleaner just because there is something clean
somewhere else?  Someone will find ib_types use it.  I still feel this is the
best first step at getting rid of ib_types.h (at least as it currently stands).

Ira

 
 Jason


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Jason Gunthorpe
On Tue, Oct 19, 2010 at 06:32:57PM -0700, Ira Weiny wrote:
 On Tue, 19 Oct 2010 18:09:58 -0700
 Jason Gunthorpe jguntho...@obsidianresearch.com wrote:
 
  On Tue, Oct 19, 2010 at 06:00:51PM -0700, Hefty, Sean wrote:
Can we at least agree on the usage of these structures first? Are the
constants going to be in host or network byte order?
   
   I was simply suggesting to 'move' some of the existing structures and 
   defines.
  
  But they are horrible and little used outside opensm right now, you
  really want to commit to that forever?
 
 Not everything is horrible.  And if it is we can fix it.  But I think
 defining yet another header with the same functionality is worse.
 Like it or

libibumad is a system library. It needs to have a stable ABI, low
churn and ideally be 'complete'.

My database of IB structs has 117 structures, all with wakky alignment
and all manner of strangeness. IMHO, it is infeasible to keep with the
ad hoc approach in ibtypes.h and generate a complete header set
without a lot of churn. This is why it is horrible.

There are things worse than 'yet another' header - for instance a
system library being churned again and again for cleanups. Figure out
what you want, do it once, do it right, be done.

If we could all agree what these structs should look like I can
provide my database and someone can write the codegen AND WE CAN BE
DONE FOREVER. How is this not much better??

Don't treat the API of a system library as some casual thing. :(

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib mad definitions

2010-10-19 Thread Jason Gunthorpe
On Tue, Oct 19, 2010 at 06:12:56PM -0700, Ira Weiny wrote:

 I am all for cleaner, more capable... but why incompatible?  If we want to
 start fresh and then convert OpenSM later, fine.  But _don't_ forget to go
 back and convert OpenSM, because if you leave ib_types.h out there someone is
 going to use it and we are back to where we started...  :-(  Same for ibmad,
 when these definitions become available in umad, mad can be simplified.

ib_types.h should not be installed in /usr/include, stop doing that
and that risk goes away.

ibmad can't really be changed, it is system library with a defined
API. Maybe ibmad.2 or something, I don't know. I tried to use some of
the PR APIs in it, and I've found them not useful :(

For instance we can't just abandon the mad_get_fields approach because
we have real, usuable field access in umad, it has to stay.

 Right now there are 3 headers I find path record in.

 libibverbs: sa.h

This isn't a MAD path record, this is the kernel version, which is
unpacked. What we really needs is MAD 2 kernel and vice versa
conversion in a library. I already have code that does this in
several places :(

 libibmad: mad.h

You mean mad_get_fields IB_SA_PR_DGID_F, etc? It doesn't even have all
the fields :(

 opensm: ib_types.h

Yep.
 
 Node type is defined in:
 
 libibverbs: verbs.h
 opensm: ib_types.h
 libibmad: mad.h
 
 I could go on.

Keep in mind that for the most part libibmad is someones attempt to
make a set of accessors and structures for mads. It is incomplete. It
is largely unusable. I certainly haven't been able to use its PR
structure parsing functions for any real app. Was it just pulled out
of opensm? I don't know, I'd just as soon see that part of it be
discarded, and a complete set of structures added to umad.

opensm has unique problems because they want to remain independent of the
OFA stack, I don't think they have a choice but to duplicate.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html