On 12/14/2011 10:18 PM, Ira Weiny wrote:
> On Wed, 14 Dec 2011 18:21:28 -0800
> Hal Rosenstock <[email protected]> wrote:
>
>> On 12/14/2011 12:09 AM, Ira Weiny wrote:
>>>
>>> In addition print transaction ID of all DR PATH dumps to make sure we know
>>> which MAD's they refer to.
>>>
>>> Signed-off-by: Ira Weiny <[email protected]>
>>> ---
>>> opensm/osm_helper.c | 5 +++--
>>> opensm/osm_sm_mad_ctrl.c | 7 +++++++
>>> 2 files changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/opensm/osm_helper.c b/opensm/osm_helper.c
>>> index f9f3d9d..b6591c4 100644
>>> --- a/opensm/osm_helper.c
>>> +++ b/opensm/osm_helper.c
>>> @@ -2059,8 +2059,9 @@ void osm_dump_smp_dr_path(IN osm_log_t * p_log, IN
>>> const ib_smp_t * p_smp,
>>> char buf[BUF_SIZE];
>>> unsigned n;
>>>
>>> - n = sprintf(buf, "Received SMP on a %u hop path: "
>>> - "Initial path = ", p_smp->hop_count);
>>> + n = sprintf(buf, "Received SMP (TID 0x%" PRIx64 ") on a %u hop
>>> path: "
>>> + "Initial path = ",
>>> + cl_ntoh64(p_smp->trans_id), p_smp->hop_count);
>>> n += sprint_uint8_arr(buf + n, sizeof(buf) - n,
>>> p_smp->initial_path,
>>> p_smp->hop_count + 1);
>>> diff --git a/opensm/osm_sm_mad_ctrl.c b/opensm/osm_sm_mad_ctrl.c
>>> index ee92c66..6abf8b8 100644
>>> --- a/opensm/osm_sm_mad_ctrl.c
>>> +++ b/opensm/osm_sm_mad_ctrl.c
>>> @@ -721,6 +721,13 @@ static void sm_mad_ctrl_send_err_cb(IN void *context,
>>> IN osm_madw_t * p_madw)
>>> ib_get_sm_attr_str(p_smp->attr_id), cl_ntoh32(p_smp->attr_mod),
>>> cl_ntoh64(p_smp->trans_id));
>>>
>>> + if (p_smp->mgmt_class == IB_MCLASS_SUBN_DIR) {
>>> + osm_dump_smp_dr_path(p_ctrl->p_log, p_smp, OSM_LOG_ERROR);
>>
>> Rather than here, should this be in osm_vendor_ibumad.c ? There's
>> already one similar log there but looks like evicted entry logging was
>> not done. If not, then do any logs there need to be removed as redundant ?
>
> Yes looking a bit closer I see that is redundant with the current umad_status
> implementation.
Well, not quite. That logs on any send error and osm_dump_smp_dr_path is
only currently called for NodeInfo. That's one reason why it's down at
that layer right now. I can see your v2 patch addresses this.
> IE the message you get is:
>
> Dec 14 18:31:54 137584 [AEB0C700] 0x01 -> Received SMP on a 4 hop path:
> Initial path = 0,0,0,0,0, Return path = 0,0,0,0,0
>
> That is useless. I can alter the patch to remove that as well.
Another alternative is that if it's a bug, why not just fix that ?
>>
>> Also, does this log every timeout (at error level) ? If so, that might
>> not be a good thing in all subnets as timeouts are common.
>
> Why would you say that? I think it is very valid to know what nodes are
> timeing out. When would you not want to know the destination of what is
> timing out?
Yes, but I'm concerned about spamming the log. Timeouts are "normal" in
many subnets; maybe not yours.
I was just saying level might be reduced but I can see 5411 is error
level too.
>>
>>> + } else {
>>> + OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "LID %u\n",
>>> + cl_ntoh16(p_madw->mad_addr.dest_lid));
>>
>> Log the TID here too ?
>
> Actually I think moving that into the error print above is better.
Sure; that's another way of accomplishing the same thing.
-- Hal
> Sending V2 now,
> Ira
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html