[openib-general] getting LOC_QP_OP_ERR with IPoIB

2006-09-05 Thread Or Gerlitz
Hi,

While doing some work to have linux bonding driver be able to work on top
of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62.

ib0: failed send event (status=2, wrid=52 vend_err 62)

What does this vendor error means? its the same system over which i saw the qp 
modify error.

There are some more problematic prints i see here which i will be happy
to get some idea on their meaning...

 ib1: dev_queue_xmit failed to requeue packet
 ib1: dev_queue_xmit failed to requeue packet

 ???

 ib1: timing out; will leak address handles
 ib1: ib_dealloc_pd failed

(the pd dealloc failure is as of the ah leak) but what is the leak cause ???

Below is a more detailed snapshot of the time the problems has occured, I was
playing with this HCA 2 IB links, getting one of down for about 45 seconds (by
some instrumentation of the SM) and then the other, etc.

The ipoib code is unchanged (other then adding the ipoib_set_mcast_list 
called print).

The bonding code was changed not to set the slave mac address but rather use 
the mac address
of the active slave and also override the ether_setup() settings with the 
active slave ones.

One thing which i think to see is that the IPoIB attempts to join the IPv4 
broadcast group
even when the port IB link is down, am i correct? if yes, would it be easy to 
fix this?

Or.

 1  ib0: leaving MGID ff12:401b::::::
 2  ib0: deleting multicast group ff12:401b::::::
 3  ib0: starting multicast thread
 4  ib1: stopping multicast thread
 5  ib1: waiting for MGID ff12:401b::::::
 6  ib1: flushing multicast list
 7  ib1: leaving MGID ff12:401b::::::
 8  ib1: deleting multicast group ff12:401b::::::
 9  ib1: starting multicast thread
10  ib0: joining MGID ff12:401b::::::
11  ib1: joining MGID ff12:401b::::::
12  ib1: join completion for ff12:401b:::::: 
(status 0)
13  ib1: MGID ff12:401b:::::: AV 810033c103c0, 
LID 0xc000, SL 0
14  ib1: successfully joined all multicast groups
15  bonding: bond0: link status definitely down for interface ib0, 
disabling it
16  bonding: bond0: making interface ib1 the new active one.
17  ib0: ipoib_set_mcast_list called
18  ib1: ipoib_set_mcast_list called
19  ib0: restarting multicast task
20  ib0: stopping multicast thread
21  ib0: waiting for MGID ff12:401b::::::
22  ib0: join completion for ff12:401b:::::: 
(status -4)
23  ib0: starting multicast thread
24  ib1: restarting multicast task
25  ib1: stopping multicast thread
26  ib1: waiting for MGID ff12:401b::::::
27  ib1: adding multicast entry for mgid 
ff12:401b::::::0001
28  ib1: starting multicast thread
29  ib0: joining MGID ff12:401b::::::
30  ib1: joining MGID ff12:401b::::::0001
31  ib1: join completion for ff12:401b::::::0001 
(status 0)
32  ib1: MGID ff12:401b::::::0001 AV 810037f91d00, 
LID 0xc001, SL 0
33  ib1: successfully joined all multicast groups
34  ib0: join completion for ff12:401b:::::: 
(status -110)
35  ib0: multicast join failed for ff12:401b::::::, 
status -110
36  ib0: joining MGID ff12:401b::::::
37  ib0: join completion for ff12:401b:::::: 
(status -110)
38  ib0: multicast join failed for ff12:401b::::::, 
status -110
39  ib0: joining MGID ff12:401b::::::
40  ib0: join completion for ff12:401b:::::: 
(status -110)
41  ib0: multicast join failed for ff12:401b::::::, 
status -110
42  ib0: joining MGID ff12:401b::::::
43  ib0: stopping multicast thread
44  ib0: waiting for MGID ff12:401b::::::
45  ib0: join completion for ff12:401b:::::: 
(status -4)
46  ib0: flushing multicast list
47  ib0: deleting multicast group ff12:401b::::::
48  ib0: starting multicast thread
49  ib1: stopping multicast thread
50  ib1: waiting for MGID ff12:401b::::::0001
51  ib1: flushing multicast list
52  ib1: leaving MGID ff12:401b::::::0001
53  ib1: deleting multicast group ff12:401b::::::0001
54  ib1: leaving MGID ff12:401b::::::
55  ib1: deleting multicast group ff12:401b::::::
56  ib1: starting multicast thread
57  ib0: stopping multicast thread
58  

Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB

2006-09-05 Thread Michael S. Tsirkin
Quoting r. Or Gerlitz [EMAIL PROTECTED]:
 Subject: getting LOC_QP_OP_ERR with IPoIB
 
 Hi,
 
 While doing some work to have linux bonding driver be able to work on top
 of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 
 62.
 
   ib0: failed send event (status=2, wrid=52 vend_err 62)
 
 What does this vendor error means? its the same system over which i saw the 
 qp modify error.

vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD mismatched

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Leonid Arsh
Hi list,

 I have a question regarding the guid2lid cache file.

  The file is read by OpenSM on the start up.
  OpenSM may reassign LIDs according to the LIDs saved in this file.
 It isn't always acceptable.

 Is it a right policy? Am I missing anything here?
 Is there a way to disable the file reading on start up?

Regards,
   Leonid

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB

2006-09-05 Thread Or Gerlitz
Michael S. Tsirkin wrote:
 Quoting r. Or Gerlitz [EMAIL PROTECTED]:

 While doing some work to have linux bonding driver be able to work on top
 of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 
 62.
  ib0: failed send event (status=2, wrid=52 vend_err 62)
 What does this vendor error means? its the same system over which i saw the 
 qp modify error.


 vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD 
 mismatched

Thanks.

So what's your thinking, am i running into some ipoib bogus scenario?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB

2006-09-05 Thread Michael S. Tsirkin
Quoting r. Or Gerlitz [EMAIL PROTECTED]:
 Subject: Re: getting LOC_QP_OP_ERR with IPoIB
 
 Michael S. Tsirkin wrote:
  Quoting r. Or Gerlitz [EMAIL PROTECTED]:
 
  While doing some work to have linux bonding driver be able to work on top
  of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) 
  error 62.
 ib0: failed send event (status=2, wrid=52 vend_err 62)
  What does this vendor error means? its the same system over which i saw 
  the qp modify error.
 
 
  vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD 
  mismatched
 
 Thanks.
 
 So what's your thinking, am i running into some ipoib bogus scenario?
 
 Or.

Donnu, it looks really weird. Could you try firmware 3.5.0 please?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Hal Rosenstock
Hi Leonid,

On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
 Hi list,
 
  I have a question regarding the guid2lid cache file.
 
   The file is read by OpenSM on the start up.
   OpenSM may reassign LIDs according to the LIDs saved in this file.
  It isn't always acceptable.
 
  Is it a right policy? Am I missing anything here?
  Is there a way to disable the file reading on start up?

There is the -r (--reassign_lids) option for this but it is not the
default behavior of OpenSM.

-- Hal

 
 Regards,
Leonid
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] MPI Brodcast doubt

2006-09-05 Thread Hal Rosenstock
John,

On Mon, 2006-09-04 at 08:56, john t wrote:
 Hi,
  
 I have 3 nodes connected via IB as shown below:
  
 node1 --- switch1 --- node2
 |-- node3
  
 If node1 sends a brodcast message to node2 and node3, I want to know
 if the message is delivered to the switch twice (first time for node2
 and second time for node3) or just once (where switch will know by
 looking at some headers or so that its a brodcast message and will
 send it on all the outgoing ports) ?

Assuming nodes 1, 2, and 3 are part of the same multicast group, the
multicast send is sent once from node 1. When received at the switch, it
is replicated to all ports which have members in the same group (in this
case, nodes 2 and 3). The switch knows by the header (specifically the
LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable
to determine on which ports to forward it. However, IB multicast is
unreliable so to create reliable multicast, it is sometimes emulated
in that the sender tracks the group members and may use serial unicast
sends or augment a multicast send with unicast sends to the receivers
and track their acknowledgements of receipt.

-- Hal

 Regards,
 John T.
 
 __
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Leonid Arsh
Hi Hal,

  Thank you for your reply.

Probably I wasn't clear.

I have a problem when OpenSM, being started, reads an out-if-date guid2lid file.
OpenSM changes LIDs in this case.
I don't want  the LIDs to be changed.
As I understand it, the '-r' option, on the contrary, causes the SM to
reassign all the LIDs.

I could just remove the file to handle the problem.
I'd like to know if there is a way to do it without touching the file.

Thanks,
Leonid

On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
 Hi Leonid,

 On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
  Hi list,
 
   I have a question regarding the guid2lid cache file.
 
The file is read by OpenSM on the start up.
OpenSM may reassign LIDs according to the LIDs saved in this file.
   It isn't always acceptable.
 
   Is it a right policy? Am I missing anything here?
   Is there a way to disable the file reading on start up?

 There is the -r (--reassign_lids) option for this but it is not the
 default behavior of OpenSM.

 -- Hal

 
  Regards,
 Leonid
 
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general
 


 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] MPI Brodcast doubt

2006-09-05 Thread Dotan Barak
Hal Rosenstock wrote:
 John,

 On Mon, 2006-09-04 at 08:56, john t wrote:
   
 Hi,
  
 I have 3 nodes connected via IB as shown below:
  
 node1 --- switch1 --- node2
 |-- node3
  
 If node1 sends a brodcast message to node2 and node3, I want to know
 if the message is delivered to the switch twice (first time for node2
 and second time for node3) or just once (where switch will know by
 looking at some headers or so that its a brodcast message and will
 send it on all the outgoing ports) ?
 

 Assuming nodes 1, 2, and 3 are part of the same multicast group, the
 multicast send is sent once from node 1. When received at the switch, it
 is replicated to all ports which have members in the same group (in this
 case, nodes 2 and 3). The switch knows by the header (specifically the
 LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable
 to determine on which ports to forward it. However, IB multicast is
 unreliable so to create reliable multicast, it is sometimes emulated
 in that the sender tracks the group members and may use serial unicast
 sends or augment a multicast send with unicast sends to the receivers
 and track their acknowledgements of receipt.

 -- Hal
   
All of the above is true for IB multicast (there isn't any broadcast in IB).

If the question was what happens when one send a message using 
MPI_broadcast?
then the answer will be: it depends on the MPI implementation.
I know that in MVAPICH the MPI handles the duplications by itself by default
(and the switch will get two messages and not one).
There is an option in that MPI to use IB multicast but it is disabled by 
default.

Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Hal Rosenstock
Hi Leonid,

On Tue, 2006-09-05 at 08:11, Leonid Arsh wrote:
 Hi Hal,
 
   Thank you for your reply.
 
 Probably I wasn't clear.
 
 I have a problem when OpenSM, being started, reads an out-if-date guid2lid 
 file.
 OpenSM changes LIDs in this case.

How do you know the file is out of date ?

 I don't want  the LIDs to be changed.

Oh, it's the other way you were asking about.

 As I understand it, the '-r' option, on the contrary, causes the SM to
 reassign all the LIDs.
 
 I could just remove the file to handle the problem.

or move it aside.

 I'd like to know if there is a way to do it without touching the file.

Not currently. There is the -x (--honor_guid2lid) which will do this
(ignore the guid2lid file) when OpenSM is coming out of STANDBY though.

-- Hal

 Thanks,
 Leonid
 
 On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
  Hi Leonid,
 
  On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
   Hi list,
  
I have a question regarding the guid2lid cache file.
  
 The file is read by OpenSM on the start up.
 OpenSM may reassign LIDs according to the LIDs saved in this file.
It isn't always acceptable.
  
Is it a right policy? Am I missing anything here?
Is there a way to disable the file reading on start up?
 
  There is the -r (--reassign_lids) option for this but it is not the
  default behavior of OpenSM.
 
  -- Hal
 
  
   Regards,
  Leonid
  
   ___
   openib-general mailing list
   openib-general@openib.org
   http://openib.org/mailman/listinfo/openib-general
  
   To unsubscribe, please visit 
   http://openib.org/mailman/listinfo/openib-general
  
 
 
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general
 
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question

2006-09-05 Thread Or Gerlitz
Michael S. Tsirkin wrote:
 Donnu, it looks really weird. Could you try firmware 3.5.0 please?

I just noted that you can not work with mstflint if the mthca driver is 
not loaded, i think it was not the case in the gen1 tools, am i correct.

Is this connected to this print

ACPI: PCI interrupt for device :02:00.0 disabled

i see once the mthca driver is unloaded?

Or.

 dill:/tmp # modprobe -r ib_mthca

 dill:/tmp # ./mstflint -d 00:02:00.0 q
 *** ERROR *** Read a corrupted device id (0x). Probably HW/PCI access 
 problem
 *** ERROR *** Device type 65535 not supported.
 *** ERROR *** Can not get flash type using device 00:02:00.0

 dill:/tmp # modprobe ib_mthca

 dill:/tmp # ./mstflint -d 00:02:00.0 q
 Image type:  Failsafe
 I.S. Version:1
 Chip Revision:   A1
 GUID Des:Node Port1Port2Sys image
 GUIDs:   0008f104039651dc 0008f104039651dd 0008f104039651de 
 0008f104039651df
 Board ID: (VLT0010010001)
 VSD:
 PSID:VLT0010010001

 dill:/tmp # dmesg

 ACPI: PCI interrupt for device :02:00.0 disabled

 ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
 ib_mthca: Initializing :02:00.0
 PCI: Enabling device :02:00.0 (0110 - 0112)
 ACPI: PCI Interrupt :02:00.0[A] - GSI 29 (level, low) - IRQ 193


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar

2006-09-05 Thread Tziporet Koren
Dhabaleswar Panda wrote:
 Christian - Thanks for sending instructions for running mvapich2-0.9.5
 to Tziporet.

 Tziporet - Thanks for looking into this problem on SLES9 environment.

 Please note that a detailed user guide for running and tuning MVAPICH2
 0.9.5 is available from the following URL:

 http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html

 DK
   
Thanks to all,
We found the bug that was in memory registration flow of SLES9 only.
A fix will be available in OFED 1.1 RC4

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Leonid Arsh
Thanks,

On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
  I have a problem when OpenSM, being started, reads an out-if-date guid2lid 
  file.
  OpenSM changes LIDs in this case.

 How do you know the file is out of date ?

Actually, the LIDs were assigned by another SM.
When I start my new OpenSM, the old SM is already dead.
Before starting the new OpenSM, the  ibnetdiscover utility shows LIDs different
from ones in the file.
When I start OpenSM, the LIDs are reassigned on the fabric.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 131] working with huge pages may crash the kernel on Suse10

2006-09-05 Thread bugzilla-daemon
http://openib.org/bugzilla/show_bug.cgi?id=131


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED




--- Comment #1 from [EMAIL PROTECTED]  2006-09-05 06:16 ---
was fixed in 1.1-rc3




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 145] IB Core unable to communicate IPoIB on Fedora Core 4

2006-09-05 Thread bugzilla-daemon
http://openib.org/bugzilla/show_bug.cgi?id=145


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||WONTFIX




--- Comment #2 from [EMAIL PROTECTED]  2006-09-05 06:18 ---
this is not a bug in OFED




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Hal Rosenstock
Leonid,

On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote:
 Thanks,
 
 On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
   I have a problem when OpenSM, being started, reads an out-if-date 
   guid2lid file.
   OpenSM changes LIDs in this case.
 
  How do you know the file is out of date ?
 
 Actually, the LIDs were assigned by another SM.

Different (vendor) SMs have different LID assignment and pathing
(routing) policies. It is inadvisable to failover across vendor SMs for
this and other reasons.

-- Hal

 When I start my new OpenSM, the old SM is already dead.
 Before starting the new OpenSM, the  ibnetdiscover utility shows LIDs 
 different
 from ones in the file.
 When I start OpenSM, the LIDs are reassigned on the fabric.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] opensm: osm_log_init_v2() - new osm_log initializer

2006-09-05 Thread Hal Rosenstock
On Mon, 2006-09-04 at 13:20, Sasha Khapyorsky wrote:
 There is new osm_log initializer osm_log_init_v2(), this is wrapped
 by osm_log_init() in order to preserve existing API.
 
 Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]

Thanks. Applied (to trunk and 1.1).

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Eitan Zahavi
Hi Leonid,

The best approach when switching from another vendor SM to 
OpenSM is to delete the /var/cache/osm/guid2lid file.

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:openib-general-
 [EMAIL PROTECTED] On Behalf Of Hal Rosenstock
 Sent: Tuesday, September 05, 2006 4:18 PM
 To: Leonid Arsh
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] OpenSM - guid2lid cache file questions
 
 Leonid,
 
 On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote:
  Thanks,
 
  On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
I have a problem when OpenSM, being started, reads an out-if-date
 guid2lid file.
OpenSM changes LIDs in this case.
  
   How do you know the file is out of date ?
  
  Actually, the LIDs were assigned by another SM.
 
 Different (vendor) SMs have different LID assignment and pathing
 (routing) policies. It is inadvisable to failover across vendor SMs for
this and
 other reasons.
 
 -- Hal
 
  When I start my new OpenSM, the old SM is already dead.
  Before starting the new OpenSM, the  ibnetdiscover utility shows LIDs
  different from ones in the file.
  When I start OpenSM, the LIDs are reassigned on the fabric.
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question

2006-09-05 Thread Michael S. Tsirkin
Quoting r. Or Gerlitz [EMAIL PROTECTED]:
 Subject: Re: getting LOC_QP_OP_ERR with IPoIB - mstflint question
 
 Michael S. Tsirkin wrote:
  Donnu, it looks really weird. Could you try firmware 3.5.0 please?
 
 I just noted that you can not work with mstflint if the mthca driver is 
 not loaded, i think it was not the case in the gen1 tools, am i correct.

Yes, recent kernels disable device access once driver is unloaded:

mstflint -d 08:00.0 q
*** ERROR *** Read a corrupted device id (0x). Probably HW/PCI access
problem
*** ERROR *** Device type 65535 not supported.
*** ERROR *** Can not get flash type using device 08:00.0

mstflint should work without driver using /proc:
mstflint -d /proc/bus/pci/08/00.0 q
Image type:  Failsafe
I.S. Version:1
Chip Revision:   A0


In gen1 flint had a separate driver which you had to load.
I am not sure whether this would work on 2.6.18

 Is this connected to this print
 
   ACPI: PCI interrupt for device :02:00.0 disabled
 
 i see once the mthca driver is unloaded?
 
 Or.

Probably not.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Bub Thomas
Title: libibcm can't connect/talk to libicm on other machine.






Im still in the process of migrating my gen1 application to gen2.

Actually I CAN connect a gen2 application to a gen2 listener application on the same machine but NOT to a gen 2 listener on another machine.

Any hints where to look at?

Is there anything in the architecture that might prevent a libibcm connection to another machine?

Im using an old Voltaire switch to connect the machines. Can this be the reason?

The switch didnt cause problems using gen1 clients.

Thanks

Thomas Bub


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Dotan Barak
Hi bub.


Bub Thomas wrote:

 I’m still in the process of migrating my gen1 application to gen2.

 Actually I CAN connect a gen2 application to a gen2 listener 
 application on the same machine but NOT to a gen 2 listener on another 
 machine.

 Any hints where to look at?

 Is there anything in the architecture that might prevent a libibcm 
 connection to another machine?

 I’m using an old Voltaire switch to connect the machines. Can this be 
 the reason?

 The switch didn’t cause problems using gen1 clients.

What is the problem that you see?
there are some examples that comes with the libibcm that can show you 
how to use the library.

there can be several reasons for your problem:
1) side A send a req when side B is not ready and there is a timeout failure
2) only in side A the ib_ucm kernel module enabled
3) SM is not working (well)
4) host A cannot be reached to host B using IB
5) endianess issues?

i tried to use the libibcm and i don't have any problem (but i don't 
have any Voltaire switch, so i can't check your scenario).

Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Hal Rosenstock
Hi Bub,

On Tue, 2006-09-05 at 10:22, Bub Thomas wrote:
 I’m still in the process of migrating my gen1 application to gen2.
 
 Actually I CAN connect a gen2 application to a gen2 listener
 application on the same machine but NOT to a gen 2 listener on another
 machine.
 
 Any hints where to look at?

What are you using for SM ? OpenSM or vendor SM ?

 Is there anything in the architecture that might prevent a libibcm
 connection to another machine?

I don't think this is an architectural issue.

-- Hal

 I’m using an old Voltaire switch to connect the machines. Can this be
 the reason?
 
 The switch didn’t cause problems using gen1 clients.
 
 Thanks
 
 Thomas Bub
 
 
 
 __
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Bub Thomas
Dotan,
the ibv_rc_pingpong example works for me so I can exclude the
architecture.
I never got the libibcm example compiled.
Which is your example and which architecture x86 vs. x86_64 did you
compile it for?
Can you share your libibcm the example code? (if it is not the standard
that I can't get compiled)
Thomas

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dotan Barak
Sent: Tuesday, September 05, 2006 5:12 PM
To: Bub Thomas
Cc: openib-general@openib.org
Subject: Re: [openib-general] libibcm can't connect/talk to libicm on
other machine.

Hi bub.


Bub Thomas wrote:

 I'm still in the process of migrating my gen1 application to gen2.

 Actually I CAN connect a gen2 application to a gen2 listener 
 application on the same machine but NOT to a gen 2 listener on another

 machine.

 Any hints where to look at?

 Is there anything in the architecture that might prevent a libibcm 
 connection to another machine?

 I'm using an old Voltaire switch to connect the machines. Can this be 
 the reason?

 The switch didn't cause problems using gen1 clients.

What is the problem that you see?
there are some examples that comes with the libibcm that can show you 
how to use the library.

there can be several reasons for your problem:
1) side A send a req when side B is not ready and there is a timeout
failure
2) only in side A the ib_ucm kernel module enabled
3) SM is not working (well)
4) host A cannot be reached to host B using IB
5) endianess issues?

i tried to use the libibcm and i don't have any problem (but i don't 
have any Voltaire switch, so i can't check your scenario).

Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] New development tool for boot-time drivers (FCode, IEE-1275, IBM/Sun)

2006-09-05 Thread David L Paktor

If anyone is interested in developing boot-time device drivers for plug-in
devices, conformant to the IEEE-1275 (Open Firmware) specification, using
FCode (tokenized Forth source), which is compatible with both IBM and Sun
platforms (and is platform-independent, so that a driver written once is
compatible with all Open Firmware platforms ... but you already know all
this if you're using Open Firmware), then you will need a Tokenizer to
translate from your Forth source to FCode tokens, which are the medium
of exchange between the device and the platform.

I am writing to announce that a new FCode Tokenizer, capable of running
on IBM equipment (and that can be compiled on any other host that supports
the GnuCC compiler, and others as well) is freely available at the web-site
of the OpenBIOS project, www.openbios.org  (and just follow the links
about the New FCODE suite)

If you have any questions, please direct them to the OpenBIOS Mailing List.

Thank you.

-

David L. Paktor  System Firmware Developer
System and Technology Group  Global Firmware Division
[EMAIL PROTECTED]  David L Paktor/Almaden/[EMAIL PROTECTED]

18880 Homestead Rd.  Building 9945
Cupertino CA 95014   Room 1026
408-342-6110 T/L 560-6110

The Bug Stops Here
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Sean Hefty
Bub Thomas wrote:
 Dotan,
 the ibv_rc_pingpong example works for me so I can exclude the
 architecture.
 I never got the libibcm example compiled.
 Which is your example and which architecture x86 vs. x86_64 did you
 compile it for?
 Can you share your libibcm the example code? (if it is not the standard
 that I can't get compiled)
 Thomas

Did you try applying the following patch?

http://openib.org/pipermail/openib-general/2006-August/025005.html

I should also mention that I have a version of cmpost that works with the new 
libibsa, but I am waiting for the review of the kernel sa_query changes before 
committing.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread JWM
Title: libibcm can't connect/talk to libicm on other machine.



 I know this sounds simple, but 
have you checked the routing tables?
 JW

  - Original Message - 
  From: 
  Bub 
  Thomas 
  To: openib-general@openib.org 
  Sent: Tuesday, September 05, 2006 9:22 
  AM
  Subject: [openib-general] libibcm can't 
  connect/talk to libicm on other machine.
  
  I’m 
  still in the process of migrating my gen1 application to 
  gen2.
  Actually I CAN connect a gen2 application to a gen2 listener application on the same machine but NOT to a gen 2 listener 
  on another machine.
  Any hints where to look 
  at?
  Is 
  there 
  anything in the architecture 
  that might prevent a libibcm connection to another machine?
  I’m using an old Voltaire switch to 
  connect the machines. Can this be the 
  reason?
  The switch 
  didn’t cause problems using 
  gen1 clients.
  Thanks
  Thomas 
  Bub
  
  

  ___openib-general 
  mailing 
  listopenib-general@openib.orghttp://openib.org/mailman/listinfo/openib-generalTo 
  unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis
Robert,

Here is a slightly modified patch for your attributes issue. Can you give it a 
try?

Signed-off by: Arlin Davis [EMAIL PROTECTED]

Index: dapl/openib/dapl_ib_util.c
===
--- dapl/openib/dapl_ib_util.c  (revision 9106)
+++ dapl/openib/dapl_ib_util.c  (working copy)
@@ -446,6 +446,7 @@
return(dapl_convert_errno(errno,ib_query_hca));
 
if (ia_attr != NULL) {
+   (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-ia_address_ptr = 
@@ -470,7 +471,12 @@
/* ia_attr-hardware_version_minor   = dev_attr.fw_ver; */
ia_attr-max_eps  = dev_attr.max_qp;
ia_attr-max_dto_per_ep   = dev_attr.max_qp_wr;
-   ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_in = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_out= dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
+   ia_attr-max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
ia_attr-max_evds = dev_attr.max_cq;
ia_attr-max_evd_qlen = dev_attr.max_cqe;
ia_attr-max_iov_segments_per_dto = dev_attr.max_sge;
@@ -501,6 +507,7 @@
}

if (ep_attr != NULL) {
+   (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr));
ep_attr-max_mtu_size = port_attr.max_msg_sz;
ep_attr-max_rdma_size= port_attr.max_msg_sz;
ep_attr-max_recv_dtos= dev_attr.max_qp_wr;
Index: dapl/openib_cma/dapl_ib_util.c
===
--- dapl/openib_cma/dapl_ib_util.c  (revision 9106)
+++ dapl/openib_cma/dapl_ib_util.c  (working copy)
@@ -424,6 +424,7 @@
return(dapl_convert_errno(errno,ib_query_hca));
 
if (ia_attr != NULL) {
+   (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-ia_address_ptr = 
@@ -446,6 +447,8 @@
ia_attr-hardware_version_major = dev_attr.hw_ver;
ia_attr-max_eps  = dev_attr.max_qp;
ia_attr-max_dto_per_ep   = dev_attr.max_qp_wr;
+   ia_attr-max_rdma_read_in = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_out= dev_attr.max_qp_rd_atom;
ia_attr-max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
ia_attr-max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
ia_attr-max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
@@ -481,6 +484,7 @@
}

if (ep_attr != NULL) {
+   (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr));
ep_attr-max_mtu_size = port_attr.max_msg_sz;
ep_attr-max_rdma_size= port_attr.max_msg_sz;
ep_attr-max_recv_dtos= dev_attr.max_qp_wr;
Index: dapl/openib_scm/dapl_ib_util.c
===
--- dapl/openib_scm/dapl_ib_util.c  (revision 9106)
+++ dapl/openib_scm/dapl_ib_util.c  (working copy)
@@ -373,6 +373,7 @@
return(dapl_convert_errno(errno,ib_query_hca));
 
if (ia_attr != NULL) {
+   (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-ia_address_ptr = 
(DAT_IA_ADDRESS_PTR)hca_ptr-hca_address;
@@ -390,7 +391,12 @@
/* ia_attr-hardware_version_minor   = dev_attr.fw_ver; */
ia_attr-max_eps  = dev_attr.max_qp;
ia_attr-max_dto_per_ep   = dev_attr.max_qp_wr;
-   ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_in = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_out= dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
+   ia_attr-max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
ia_attr-max_evds = dev_attr.max_cq;
ia_attr-max_evd_qlen

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Arlin Davis wrote:
 Robert,
 
 Here is a slightly modified patch for your attributes issue. Can you give it 
 a try?
 

I'll give it a spin this afternoon: it looks quite a bit more
comprehensive than the small patch I did.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP3sXfzvnpzTd9fxAQLwwAf+IOIsC+gqb9Juzt8rwJJlnSW1PjZFrRGi
NrCnRXvn52tsgclNNHGSzqOgkIntZ2TqxwEJJeTou3UhUQ5laJWEkQgwrvFTazcn
+IQH3BGDLFyZJJQO0WSi2685dEKOH5by6Zp9yVo9sy3Odu6jod2v/uCOjdGkR8ys
CvQW+y70qDmom1SJ9P2XQ4/dxxX/v2IFYOWMoVzMlDZsNnvnti/Uspwc1KpQeP6F
RRwWImlDyuuAW6+JX6atM5Lne797T5IO7MugW6d/+0oAMVU7H3oiDBdX+9tVwBci
IBJJ/PdQ8e7a7x4uOg+LKOSDH16IFVNaua4XhBfVmQEjf1y41KepDw==
=1zt8
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Question about interrupt generation

2006-09-05 Thread harish
Hi All,I tried the following simple experiment and am not able to understand the results:Calcualted the number of interrupts generated by the infiniband [with little or no traffic to the NIC] over a period of 10seconds and saw around 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K interrupts/sec. This screwed up my host machine. To reduce the impact of the interrupts, I add a call back that is scheduled to be periodically called every few microseconds that masks the irq line used by the NIC and a little later unmasks the same. Noticed that with no traffic, I see anywhere between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+ interrupts/sec.
Am a newbie to infiniband technology and so do not understand why so many interrupts are getting generated when I have my call back periodically called. Could it be that the Infiniband supports MSI? Or is what I am seeing IPIs? Or does Infiniband generate interrupts based on types of events and what I am doing by masking/unmasking the interrupt line is one such event?
Any information/suggestions would be useful.Thanks in advance,harish
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups

2006-09-05 Thread Roland Dreier
Thanks, I've rolled this up in the amso1100 patch I have queued up.

  - #if 0 the following unused global function:
   - c2_mq.c: c2_mq_count()

Tom/Steve, any reason to keep c2_mq_count() at all?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction

2006-09-05 Thread Roland Dreier
Thanks, queued for 2.6.19.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis
Robert Walsh wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Arlin Davis wrote:
  

Robert,

Here is a slightly modified patch for your attributes issue. Can you give it 
a try?




I'll give it a spin this afternoon: it looks quite a bit more
comprehensive than the small patch I did.

Regards,
 Robert.
  


Just added all appropriate RDMA in/out fields and some code to zero out 
the structure to avoid uninitialized data fields.

-arlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 Just added all appropriate RDMA in/out fields and some code to zero out
 the structure to avoid uninitialized data fields.

Yup.  By comprehensive, I meant better :-)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP32hfzvnpzTd9fxAQJnMwgAgcyxQpxdbk/eLEECXTnAOAYjyv3seTpE
Ir1s+K7JEYL2Rbyk9h9CzbK67YSYe4QeIE52pTopEVFw8mnSLaz+ZIOmvdRUiHSS
FiwEyfbXEPrFKZfyXu/REsigWx5vn7vCZid3hUIdx1vbt9eVAiVPGbAO1ALI8en9
/xc7iTGpYxwBwNOYbdhW0cOCjvobV98Fp6UJebvxd9xiRUS6c2JeZKLYdQyRO5rm
JV7L8HqJr1dS8nbAiPG7DSjCv7/3SFdQVr+Tgt5MQpVfD56z41eBBuXzEfeqsg5E
HHSxUOTdqizpscMyLudAWGAr5DZwOAQ4Z90zAL8gc2YYbjbOT3D6bA==
=JKRU
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups

2006-09-05 Thread Steve Wise

Its old debug code that isn't used anywhere.  It would be nice to keep
it around, but if you really don't want it, nuke it...




On Tue, 2006-09-05 at 14:57 -0700, Roland Dreier wrote:
 Thanks, I've rolled this up in the amso1100 patch I have queued up.
 
   - #if 0 the following unused global function:
- c2_mq.c: c2_mq_count()
 
 Tom/Steve, any reason to keep c2_mq_count() at all?
 
  - R.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Arlin Davis wrote:
 Robert,
 
 Here is a slightly modified patch for your attributes issue. Can you give it 
 a try?

Oddly enough, I'm back to the same problem with your new patch as I saw
with the unpatched version:

  $ mpiexec -n 2 ./a.out
  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
rdma configuration
  will use rdma configuration
  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
  Hello world: rank 0 of 2 running on ib-idev-05
  rank 1 in job 1  ib-idev-05_51891   caused collective abort of all ranks
exit status of rank 1: killed by signal 9

Still tracking this one down.  I noticed in the patch you removed a
couple of lines, too:

  - ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;

Any particular reason why you did this?

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP37QvzvnpzTd9fxAQI79wf6Anc3/Ve7tg3x31hE4i5qa9bB01qEYmEv
9xx4FQqXNbhMos9hHEQAWJ9S0sKccr+yCNekkIX6GzlaVDv+AKDzZF6uzA8Prrhr
CEcf28c1Pw7gflg8MMfVcnAHr2YG/hXyd+ve9m6cGv0rxgPqY6lWmHjghKDxKO7h
f/SaDOaVAuN6kEJMRgIrKIxDyFSVl4z1tGXAK3yHVhslvPqNqGwDqNfFMV6UQK+V
NNfKVVKVCttUWdzcVELzi3zkiat5xDdqIcwQr8xs2YaXHfAGeD4NurWowil887Sn
bRuh5soVdBaKW9mAtQWuAECt9VLDvyYReLWkEq6ikgilPGCeJluDEw==
=TNaE
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups

2006-09-05 Thread Roland Dreier
Steve Its old debug code that isn't used anywhere.  It would be
Steve nice to keep it around, but if you really don't want it,
Steve nuke it...

No, that's fine, I'll leave it inside the #if 0.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis


Oddly enough, I'm back to the same problem with your new patch as I saw
with the unpatched version:
 
Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your 
adapter and it worked.

Did you ever pick up the Intel MPI 3.0 beta?


  $ mpiexec -n 2 ./a.out
  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
rdma configuration
  will use rdma configuration
  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
  Hello world: rank 0 of 2 running on ib-idev-05
  rank 1 in job 1  ib-idev-05_51891   caused collective abort of all ranks
exit status of rank 1: killed by signal 9

Still tracking this one down.  I noticed in the patch you removed a
couple of lines, too:

  - ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;

Any particular reason why you did this?

max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. 

Look at dat.h line #369

/* To support backwards compatibility for DAPL-1.0 */
#define max_rdma_read_per_epmax_rdma_read_per_ep_in
#define DAT_IA_FIELD_IA_MAX_DTO_PER_OP  DAT_IA_FIELD_IA_MAX_DTO_PER_EP_IN

/* To support backwards compatibility for DAPL-1.0  DAPL-1.1 */
#define max_mtu_size max_message_size


-arlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 Oddly enough, I'm back to the same problem with your new patch as I saw
 with the unpatched version:
  
 Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your 
 adapter and it worked.

Weird - it's not working for me at all.  Maybe I'm messing up somewhere.
 I've got a meeting for the next hour or so - I'll check again when I
get back.

 Did you ever pick up the Intel MPI 3.0 beta?

Yup.

 max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. 

Ah - fair enough.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4DLPzvnpzTd9fxAQJ3nwgAiO+dLDRQv22RrBHYqHcodDwC2ZakxzFh
pXBn9j5kwzA2EmnXCvex14v7K168Alqr9lgUpfaGr6StZsCdBU0FY2TRjok41VFl
h+fYu78QFgDjleTMkp17Hl7RG9/r8AWzKzTG1LDn1YqwHrn9ngeZlqFfy1BP1tfB
pkkW+Nj7HQXbXUNiDc/V9HKW7eBOjwCvkfDI7Knbrfp2QVBI/9ABpWGO4bJf3P7X
n9ZzlEBN0SCOHKtGAa1gspQrmJGMHw0qyajUA6Yuyp1dWRygbl8L+ahF2BJFwZSx
KGyhoBRZexpP8m0AJASnKgAVjGf6JR31dL7O8WAOjD4QpFEofMSqqA==
=yDmH
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 218] New: Call usage verifier is detecting reinitialization of spinlocks already in use

2006-09-05 Thread bugzilla-daemon
http://openib.org/bugzilla/show_bug.cgi?id=218

   Summary: Call usage verifier is detecting reinitialization of
spinlocks already in use
   Product: OpenFabrics Windows
   Version: unspecified
  Platform: X86
OS/Version: Other
Status: NEW
  Severity: major
  Priority: P2
 Component: mthca driver
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]


I built a debug version of revision 467 and turned on call usage verifier (CUV)
for the mthca driver. It's detecting many cases of spinlocks being initialized
after they have already been used. This is usually bad. To build with CUV all
you have to do is add the following line to the sources file.

VERIFIER_DDK_EXTENSIONS=1

My experience is CUV tends to detect a different set of bugs from driver
verifier, and it might be useful to turn on CUV for all the drivers and see
what's reported.

CUV Driver Error: Calling KeInitializeSpinLock(...) at File
k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h, Line 57
  The Spin lock specified as parameter 1 [0x87840EDC]
  has been previously initialized and used as
  a In-Stack Queued Spin lock by this driver.
Break, Ignore, Zap, Remove, Disable all, H for help (bizrdh)? b
b
Breaking in... (press genter to return to assert menu)
Break instruction exception - code 8003 (first chance)
nt!DbgBreakPoint:
8075cc00 cc   int 3
0: kd k 50
ChildEBP RetAddr  
f7926438 baeab189 nt!DbgBreakPoint
f7926450 baeaa814 mthca!DDKExtPrompt+0x10a
[d:\dnsrv\sdktools\ddk\ddk_ext\verifier\messages.cpp @ 709]
f7926468 baea990e mthca!DDKExtVInitializeItem+0x98
[d:\dnsrv\sdktools\ddk\ddk_ext\verifier\validate.cpp @ 195]
f7926490 bae81635 mthca!DDK_KeInitializeSpinLock+0x35
[d:\dnsrv\sdktools\ddk\ddk_ext\verifier\locks.cpp @ 298]
f79264a4 baea42ee mthca!spin_lock_init+0x15
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h @ 58]
f79264b0 baea4057 mthca!mthca_wq_init+0xe
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 383]
f792653c bae7eaac mthca!mthca_modify_qp+0xe97
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 853]
f7926550 bae76eaa mthca!ibv_modify_qp+0x1c
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_verbs.c @ 467]
f7926628 ba99e0f3 mthca!mlnx_modify_qp+0x11a
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\hca_verbs.c @ 955]
f792673c ba99df12 ibbus!al_modify_qp+0x113
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1346]
f7926760 ba99d7b8 ibbus!modify_qp+0x502
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1313]
f7926778 ba99eef5 ibbus!ib_modify_qp+0x18
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1288]
f7926848 ba99ec9e ibbus!init_dgrm_svc+0x175
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1453]
f7926870 ba96d005 ibbus!ib_init_dgrm_svc+0x73e
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1395]
f7926c4c ba969fd8 ibbus!create_spl_qp_svc+0x18a5
[k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 718]
f7926c78 ba969a45 ibbus!spl_qp_agent_pnp+0x128
[k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 476]
f7926c8c ba98f071 ibbus!spl_qp0_agent_pnp_cb+0x95
[k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 429]
f7926cf4 ba98f2e8 ibbus!__pnp_notify_user+0x561
[k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 523]
f7926d38 ba990e7c ibbus!__pnp_port_notify+0x118
[k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 612]
f7926d70 ba94d8a4 ibbus!__pnp_process_add_ca+0x2dc
[k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 943]
f7926d8c ba953b94 ibbus!__cl_async_proc_worker+0x94
[k:\windows-openib\src\winib-467b\core\complib\cl_async_proc.c @ 153]
f7926da0 ba955c4c ibbus!__cl_thread_pool_routine+0x54
[k:\windows-openib\src\winib-467b\core\complib\cl_threadpool.c @ 67]
f7926dac 80a07678 ibbus!__thread_callback+0x2c
[k:\windows-openib\src\winib-467b\core\complib\kernel\cl_thread.c @ 49]
f7926ddc 80781346 nt!PspSystemThreadStartup+0x2e
  nt!KiThreadStartup+0x16
0: kd g




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about interrupt generation

2006-09-05 Thread harish
Hi,One more question. What kind of event mask helps mask the interrupts?thanksharishOn 9/5/06, harish 
[EMAIL PROTECTED] wrote:Hi All,I tried the following simple experiment and am not able to understand the results:
Calcualted the number of interrupts generated by the infiniband [with little or no traffic to the NIC] over a period of 10seconds and saw around 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K interrupts/sec. This screwed up my host machine. To reduce the impact of the interrupts, I add a call back that is scheduled to be periodically called every few microseconds that masks the irq line used by the NIC and a little later unmasks the same. Noticed that with no traffic, I see anywhere between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+ interrupts/sec.
Am a newbie to infiniband technology and so do not understand why so many interrupts are getting generated when I have my call back periodically called. Could it be that the Infiniband supports MSI? Or is what I am seeing IPIs? Or does Infiniband generate interrupts based on types of events and what I am doing by masking/unmasking the interrupt line is one such event?
Any information/suggestions would be useful.Thanks in advance,harish


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Woodruff, Robert J wrote:
 Robert Walsh wrote,
 I'll give it a spin this afternoon: it looks quite a bit more
 comprehensive than the small patch I did.
 
 I also just tried running the ib_rdma_bw test and it seems to
 be flaky if you stress it. If you just run the defaults, it seems to
 work, but if you crank up the iterations and the message size,
 it sometimes fails with.
 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400
 VAddr 0x2a95dd3480
 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500
 VAddr 0x2a95c85480
 4730:main: Completion with error at client:
 4730:main: Failed status 9: wr_id 3
 4730:main: scnt=7584, ccnt=6584
 [EMAIL PROTECTED] bin]$  

This looks like a known bug, the fix to which didn't make it into OFED
1.1-RC3.  Hopefully we can still get this into 1.1-RC4.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4aOfzvnpzTd9fxAQKAEggAlZC5hYi9kdxLkj9Mfl/BwHJQxWUwsKcG
K2ck3jtrP6PVa04FdVI/TNL2XE7R3eu69vTfBaTS26pw2CVM6av0ztFiWEV2r5Fu
8FXGJBOuDOYxnwuA0o3yHSMVFtrRW6Jgn2G/JQPZ8IDAK7GrPj3VebvyclPwF5+d
KMPIFXJaTzjoJl2JEGFLiSlf+tFMOEs3vazrRwkZpQezKRcs3F1E6TQImtN7kuYK
0/IKxeS4ZOduXpczsJZgsPs6Y9kYi94XN0E4JeJJAh9Miq+bXkxhxbrafieNl7xW
n9m7i/phcFcngSzDwjBNXE2ZuQjujDpz94SRnkVedomYNbr5zKXBgQ==
=NurT
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 Here is a slightly modified patch for your attributes issue. Can you give it 
 a try?

I rebuilt OFED from scratch with the patch, and ran successfully on
Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
Intel MPI 3.0b.  If you could verify that the fix you mentioned that is
in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
If you have a later beta version you could send me, that would be great,
too.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4ijvzvnpzTd9fxAQIqeggAkJ4OQ3GrkpqyJUbHImgqbob6npINOv5L
lBUANcHZZ8DMFIq5hP4H+OYX2s/yoS3AKDGf0x8kHoVsTDFTFNe69bsGzJMT3znP
YDmq3ETN4aSGOgKX2NFzWs+mYG0pEN9uDt/SmEYmccYiIuK3lTlb8jxON6mqqJFL
nfitAp7WaLn7OS8A3CfVrAbWwYJ4U6UWPD/rB5sJTg8nTxECc94JaOhPZ90smB6H
9xk8OihEoTxodFLzcpaz/ORS4EPAle69Uw2tP3myjr/4w/SzLGJT6DFVpGQ0BaWC
jVXFYVKyVW4JmFMcW1X29ogmVNH8gEDBUfbG1P5Wd8sLzMMB18tINA==
=X/q7
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] OFED 1.1-rc2 is ready (how do I enable madeye)?

2006-09-05 Thread Scott Weitzenkamp (sweitzen)
 5. Added Madeye utility

How do I build madeye?  I don't see any reference to it to install.sh.
Is there any documentation for madeye?

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general