On Wed, 2004-11-10 at 00:55, Roland Dreier wrote:
It seems that MAD handling is still not quite right. It seems in my
set up that IPoIB is not seeing the response to its MCMember
set... (it does look like the query is reaching the SM)
This is a separate issue from the ports not becoming
Hal == Hal Rosenstock [EMAIL PROTECTED] writes:
Hal I can see now that this is wrong and have a fix for what
Hal stops IPoIB from working. The problem was that the response
Hal was received by the MAD layer but not dispatched due to the
Hal change(s) noted above.
Hal So I am
On Wed, 2004-11-10 at 10:36, Roland Dreier wrote:
Yes, IPoIB works for me again.
Thanks for validating.
Hal Also, it seems to me that no response needs to be handed to
Hal process_mad. Does this optimization make sense ?
I'm not sure I understand the question. process_mad
Hal Rosenstock wrote:
On Wed, 2004-11-10 at 00:55, Roland Dreier wrote:
It seems that MAD handling is still not quite right. It seems in my
set up that IPoIB is not seeing the response to its MCMember
set... (it does look like the query is reaching the SM)
This is a separate issue from
root wrote:
In following code :
if (smi_check_local_dr_smp(smp, mad_agent-device,
mad_agent-port_num)) { ...
ret = mad_agent-device-process_mad(
mad_agent-device,
0,
Hal Rosenstock wrote:
This is a separate issue from the ports not becoming active (DR handling
issue). I broke this part yesterday (not a good day at all :-( at either
r1184 and/or r1181 when I added what I thought was correct based on
Sean's emails (not dispatching additional error cases in
Sean What exactly does it mean then when process_mad returns
Sean success? Do any of the return bits from process_mad
Sean indicate that the MAD was for the HCA driver?
SUCCESS means that process_mad didn't encounter any errors. If REPLY
or CONSUMED is set then process_mad actually
By the way, if I am reading the code correctly, it looks like the MAD
layer only checks for IB_MAD_RESULT_REPLY and not
IB_MAD_RESULT_CONSUMED. If IB_MAD_RESULT_CONSUMED is set then the
packet is something like a trap repress handled by the SMA or a
locally generated trap that the driver
On Wed, 2004-11-10 at 11:59, Roland Dreier wrote:
Sean What exactly does it mean then when process_mad returns
Sean success? Do any of the return bits from process_mad
Sean indicate that the MAD was for the HCA driver?
SUCCESS means that process_mad didn't encounter any errors.
mad: In handle_outgoing_smp, validate process_mad routine exists prior
to calling it (issue pointed out by KK)
Index: mad.c
===
--- mad.c (revision 1189)
+++ mad.c (working copy)
@@ -405,30 +405,32 @@
On Tue, 2004-11-09 at 20:12, Sean Hefty wrote:
The following patch adds support for handling QP0/1 send queue overrun,
along with a couple of related fixes:
* The patch includes that provided by Roland in order to configure the
fabric.
* The code no longer modifies the user's send_wr
On Wed, 2004-11-10 at 12:36, Hal Rosenstock wrote:
I will break this up into two chunks:
1. the minor agent change
Thanks. Applied.
-- Hal
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To
Roland I think keeping the MAD code simpler is probably best right now.
Hal Hope that is for technical reasons and not for the recent missteps.
Yes, it's just that the MAD code is quite complicated already with
multiple tests for DR SMPs etc; mad.c alone is over 2000 lines now. I
don't
agent: Handle out of order send completions
(Issue pointed out by Sean)
Index: agent_priv.h
===
--- agent_priv.h(revision 1183)
+++ agent_priv.h(working copy)
@@ -46,7 +46,6 @@
struct ib_mad_agent
Hal Rosenstock wrote:
- send_wr.wr_id = ++port_priv-wr_id;
+ send_wr.wr_id = (unsigned long)agent_send_wr-send_list;
{snip}
+ send_wr = (struct list_head *)(unsigned long)mad_send_wc-wr_id;
+ agent_send_wr = container_of(send_wr, struct ib_agent_send_wr,
On Wed, 2004-11-10 at 13:43, Sean Hefty wrote:
Hal Rosenstock wrote:
Currently if no matching send request is found, the received MAD is
freed (around line 1035 of the current mad.c).
In this case, timeout too short, etc., is this the correct behavior ?
Or should the receive packet
I haven't cleared the other issues before getting back to this but
wanted to respond to some of the points below:
On Tue, 2004-11-09 at 23:55, Roland Dreier wrote:
Roland OK, I think I understand the problem, but I'm not sure
Roland what the correct solution is. When a DR SMP arrives
Roland I guess the problem with calling smi_handle_dr_smp_recv()
Roland twice on the same packet is that the function may alter
Roland the packet.
Hal No, the second call to smi_handle_dr_smp_recv() was on the
Hal outgoing response and not the incoming request. The thought
Hal Rosenstock wrote:
1. Why was BUG_ON removed from dequeue_mad ?
That can be put back. I removed queue_mad, and was going to remove
dequeue_mad, but decided to leave it.
2. A couple of questions related to send_wr-num_sge checking.
a. Should this be pushed down to mthca and detected there
Glad to see http://news.zdnet.com/2100-9593_22-5446887.html
One snippet from the article '..the grant will fund 8-10 full-time
programmers.'
Does this equate to Sean, Roland, Hal working 80 hour weeks with some
support from others merely working 40 hour weeks :)
Just wanted to say well done
On Wed, 2004-11-10 at 16:30, Sean Hefty wrote:
Hal Rosenstock wrote:
1. Why was BUG_ON removed from dequeue_mad ?
That can be put back. I removed queue_mad, and was going to remove
dequeue_mad, but decided to leave it.
I added this back in.
2. A couple of questions related to
On Wed, 2004-11-10 at 16:29, Roland Dreier wrote:
Roland I guess the problem with calling smi_handle_dr_smp_recv()
Roland twice on the same packet is that the function may alter
Roland the packet.
Hal No, the second call to smi_handle_dr_smp_recv() was on the
Hal
Removes unneeded check and relocates other to while loop.
- Sean
Index: core/mad.c
===
--- core/mad.c (revision 1197)
+++ core/mad.c (working copy)
@@ -518,14 +518,10 @@
if (!bad_send_wr)
goto error1;
-
On Wed, 2004-11-10 at 17:22, Sean Hefty wrote:
Removed locking, since this is in cleanup code.
Thanks. Applied.
-- Hal
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit
On Wed, 2004-11-10 at 17:33, Sean Hefty wrote:
Removes unneeded check and relocates other to while loop.
Thanks. Applied.
-- Hal
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe,
Spam detection software, running on the system openib.ca.sandia.gov, has
identified this incoming email as possible spam. The original message
has been attached to this so you can view it (if it isn't spam) or block
similar future email. If you have any questions, see
[EMAIL PROTECTED] for
On Wed, 2004-11-10 at 12:02, Roland Dreier wrote:
By the way, if I am reading the code correctly, it looks like the MAD
layer only checks for IB_MAD_RESULT_REPLY and not
IB_MAD_RESULT_CONSUMED.
You are reading the code correctly.
If IB_MAD_RESULT_CONSUMED is set then the
packet is
mad: After calling process_mad, handle MAD being consumed
Index: mad.c
===
--- mad.c (revision 1199)
+++ mad.c (working copy)
@@ -400,16 +400,22 @@
smp-dr_slid, /* ? */
28 matches
Mail list logo