Re: [RFC PATCH 00/15] staging/rdma/hfi1: Initial patches to add rdmavt support in HFI1

2016-01-05 Thread ira.weiny
> > > > > 
> > > > > If Doug accepts the library changes, let me know that public git 
> > > > > commit
> > > > > and I can pull it into the staging-next branch and you can continue to
> > > > > send me staging patches that way.
> > > > 
> > > > Won't this cause a conflict during the merge window?
> > > 
> > > No, git is good :)
> > > 
> > > > How do we handle changes which affect both qib and hfi1?
> > > 
> > > I don't know, now this gets messy...
> > > 
> > 
> > Agreed and this is what we are worried about.
> > 
> > Can we do what Dan and Doug have proposed in the past and have Doug take 
> > over
> > the staging/rdma sub-tree?
> > 
> > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2015-November/081922.html
> > 
> > I think the upcoming merge window is a reasonable time for him to do that.
> 
> Ok, but keeping on top of all of the generic staging patches that come
> in is a tough thing to do, that's up to Doug, if he is ready for it...
> 

Greg,

Forgive me for not knowing how multiple maintainers deal with hand offs like
this.  I'm hoping you, and/or Doug, can answer some questions for me.

Am I correct in assuming the merge window will be open on Monday?  If so, when
will Linus pull the staging tree?  Then at what point will Doug get the hfi1
changes which have already been accepted?

Are you going to be able to review the outstanding patches for
staging/rdma/hfi1 before the merge window?  Or should we consider them dropped
and resubmit to Doug to apply after he has merged the latest hfi1 code from
Linus?


Doug,

At what point should we start submitting additional patches to you which we
have queued up but are not yet submitted to Greg?

So far we have been cross posting to linux-rdma for feedback and I see that the
patches have been dropped from patchworks.  I assume you dropped them from
patchworks because you knew that Greg was handling them?

Thanks,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 TRIVIAL] IB/core: Documentation fix to the snoop handler in the MAD header file

2016-01-05 Thread ira.weiny
On Tue, Jan 05, 2016 at 01:52:55PM -0500, Hal Rosenstock wrote:
> In ib_mad.h, ib_mad_snoop_handler uses send_buf rather than send_wr
> 
> Signed-off-by: Hal Rosenstock 

First off I have to say; this comment is wrong and should be fixed.

Reviewed-by: Ira Weiny 

That said.

I agree with Sean that the snoop interface should be removed.

I don't know the specific reason madeye was not accepted upstream back in the
day.  However, I _think_ it was because it implements a non-standard tracing
mechanism.

For this reason I have explored the use of the standard tracing infrastructure
within the mad stack.  The series I sent for general comment [1] implemented
tracing at a number of levels (umad and mad) as well as tracing of registered
agents.  This IMO give more insight into what is going on within the MAD stack
than the madeye module.

I'll see what I can do to update the tracing code.  In the mean time if others
want to look at the tracing code I have so far I pushed a branch to my github.

https://github.com/weiny2/linux-kernel  doug-fn-mad-trace

Ira

[1] https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg28188.html

> ---
> Change since v3:
> Fixed title to not include function name
> 
> Change since v2:
> Changed title to use "higher" language
> 
> Change since v1:
> Fixed typo in patch description
> 
> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> index ec9b44d..2b3573d 100644
> --- a/include/rdma/ib_mad.h
> +++ b/include/rdma/ib_mad.h
> @@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent 
> *mad_agent,
>  /**
>   * ib_mad_snoop_handler - Callback handler for snooping sent MADs.
>   * @mad_agent: MAD agent that snooped the MAD.
> - * @send_wr: Work request information on the sent MAD.
> + * @send_buf: send MAD data buffer.
>   * @mad_send_wc: Work completion information on the sent MAD.  Valid
>   *   only for snooping that occurs on a send completion.
>   *
> - * Clients snooping MADs should not modify data referenced by the @send_wr
> + * Clients snooping MADs should not modify data referenced by the @send_buf
>   * or @mad_send_wc.
>   */
>  typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] IB/mad: pass ib_mad_send_buf explicitly to the recv_handler

2016-01-04 Thread ira.weiny
On Mon, Jan 04, 2016 at 02:15:58PM +0100, Christoph Hellwig wrote:
> Stop abusing wr_id and just pass the parameter explicitly.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Ira Weiny 

> ---
>  drivers/infiniband/core/cm.c  |  1 +
>  drivers/infiniband/core/mad.c | 18 ++
>  drivers/infiniband/core/sa_query.c|  7 ---
>  drivers/infiniband/core/user_mad.c|  1 +
>  drivers/infiniband/ulp/srpt/ib_srpt.c |  1 +
>  include/rdma/ib_mad.h |  2 ++
>  6 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index e3a95d1..ad3726d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum 
> ib_event_type event)
>  EXPORT_SYMBOL(ib_cm_notify);
>  
>  static void cm_recv_handler(struct ib_mad_agent *mad_agent,
> + struct ib_mad_send_buf *send_buf,
>   struct ib_mad_recv_wc *mad_recv_wc)
>  {
>   struct cm_port *port = mad_agent->context;
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index d4d2a61..cbe232a 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
>  
>   atomic_inc(_snoop_priv->refcount);
>   spin_unlock_irqrestore(_info->snoop_lock, flags);
> - mad_snoop_priv->agent.recv_handler(_snoop_priv->agent,
> + mad_snoop_priv->agent.recv_handler(_snoop_priv->agent, NULL,
>  mad_recv_wc);
>   deref_snoop_agent(mad_snoop_priv);
>   spin_lock_irqsave(_info->snoop_lock, flags);
> @@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   /* user rmpp is in effect
>* and this is an active RMPP MAD
>*/
> - mad_recv_wc->wc->wr_id = 0;
> - 
> mad_agent_priv->agent.recv_handler(_agent_priv->agent,
> -mad_recv_wc);
> + mad_agent_priv->agent.recv_handler(
> + _agent_priv->agent, NULL,
> + mad_recv_wc);
>   atomic_dec(_agent_priv->refcount);
>   } else {
>   /* not user rmpp, revert to normal behavior and
> @@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   spin_unlock_irqrestore(_agent_priv->lock, flags);
>  
>   /* Defined behavior is to complete response before 
> request */
> - mad_recv_wc->wc->wr_id = (unsigned long) 
> _send_wr->send_buf;
> - 
> mad_agent_priv->agent.recv_handler(_agent_priv->agent,
> -mad_recv_wc);
> + mad_agent_priv->agent.recv_handler(
> + _agent_priv->agent,
> + _send_wr->send_buf,
> + mad_recv_wc);
>   atomic_dec(_agent_priv->refcount);
>  
>   mad_send_wc.status = IB_WC_SUCCESS;
> @@ -2021,7 +2022,7 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   ib_mad_complete_send_wr(mad_send_wr, _send_wc);
>   }
>   } else {
> - mad_agent_priv->agent.recv_handler(_agent_priv->agent,
> + mad_agent_priv->agent.recv_handler(_agent_priv->agent, NULL,
>  mad_recv_wc);
>   deref_mad_agent(mad_agent_priv);
>   }
> @@ -2762,6 +2763,7 @@ static void local_completions(struct work_struct *work)
>  IB_MAD_SNOOP_RECVS);
>   recv_mad_agent->agent.recv_handler(
>   _mad_agent->agent,
> + >mad_send_wr->send_buf,
>   
> >mad_priv->header.recv_wc);
>   spin_lock_irqsave(_mad_agent->lock, flags);
>   atomic_dec(_mad_agent->refcount);
> diff --git a/drivers/infiniband/core/sa_query.c 
> b/drivers/infiniband/core/sa_query.c
> index e364a42..1f91b6e 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -1669,14 +1669,15 @@ static void send_handler(struct ib_mad_agent *agent,
>  }
>  
>  static void recv_handler(struct ib_mad_agent 

Re: [PATCHv2 TRIVIAL] IB/core: ib_mad.h ib_mad_snoop_handler documentation fix

2016-01-04 Thread ira.weiny
On Mon, Jan 04, 2016 at 03:44:15PM -0500, Hal Rosenstock wrote:
> ib_mad_snoop_handler uses send_buf rather than send_wr
> 
> Signed-off-by: Hal Rosenstock 

Reviewed-by: Ira Weiny 

> ---
> Change since v1:
> Fixed typo in patch description
> 
> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> index ec9b44d..2b3573d 100644
> --- a/include/rdma/ib_mad.h
> +++ b/include/rdma/ib_mad.h
> @@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent 
> *mad_agent,
>  /**
>   * ib_mad_snoop_handler - Callback handler for snooping sent MADs.
>   * @mad_agent: MAD agent that snooped the MAD.
> - * @send_wr: Work request information on the sent MAD.
> + * @send_buf: send MAD data buffer.
>   * @mad_send_wc: Work completion information on the sent MAD.  Valid
>   *   only for snooping that occurs on a send completion.
>   *
> - * Clients snooping MADs should not modify data referenced by the @send_wr
> + * Clients snooping MADs should not modify data referenced by the @send_buf
>   * or @mad_send_wc.
>   */
>  typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] IB/mad: use CQ abstraction

2016-01-04 Thread ira.weiny
On Mon, Jan 04, 2016 at 02:15:59PM +0100, Christoph Hellwig wrote:
> Remove the local workqueue to process mad completions and use the CQ API
> instead.
> 
> Signed-off-by: Christoph Hellwig 

One minor nit below...


> ---
>  drivers/infiniband/core/mad.c  | 159 
> +
>  drivers/infiniband/core/mad_priv.h |   2 +-
>  2 files changed, 58 insertions(+), 103 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index cbe232a..286d1a9 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -61,18 +61,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in 
> number of work requests
>  module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
>  MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work 
> requests");
>  
> -/*
> - * Define a limit on the number of completions which will be processed by the
> - * worker thread in a single work item.  This ensures that other work items
> - * (potentially from other users) are processed fairly.
> - *
> - * The number of completions was derived from the default queue sizes above.
> - * We use a value which is double the larger of the 2 queues (receive @ 512)
> - * but keep it fixed such that an increase in that value does not introduce
> - * unfairness.
> - */
> -#define MAD_COMPLETION_PROC_LIMIT 1024
> -
>  static struct list_head ib_mad_port_list;
>  static u32 ib_mad_client_id = 0;
>  
> @@ -96,6 +84,9 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req 
> *mad_reg_req,
> u8 mgmt_class);
>  static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
>  struct ib_mad_agent_private *agent_priv);
> +static bool ib_mad_send_error(struct ib_mad_port_private *port_priv,
> +   struct ib_wc *wc);
> +static void ib_mad_send_done(struct ib_cq *cq, struct ib_wc *wc);
>  
>  /*
>   * Returns a ib_mad_port_private structure or NULL for a device/port
> @@ -702,11 +693,11 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
>  }
>  
>  static void build_smp_wc(struct ib_qp *qp,
> -  u64 wr_id, u16 slid, u16 pkey_index, u8 port_num,
> +  void *wr_cqe, u16 slid, u16 pkey_index, u8 port_num,

Sorry I did not catch this before but rather than void * wouldn't it be better
to use struct ib_cqe?

Regardless:

Reviewed-by: Ira Weiny 

>struct ib_wc *wc)
>  {
>   memset(wc, 0, sizeof *wc);
> - wc->wr_id = wr_id;
> + wc->wr_cqe = wr_cqe;
>   wc->status = IB_WC_SUCCESS;
>   wc->opcode = IB_WC_RECV;
>   wc->pkey_index = pkey_index;
> @@ -844,7 +835,7 @@ static int handle_outgoing_dr_smp(struct 
> ib_mad_agent_private *mad_agent_priv,
>   }
>  
>   build_smp_wc(mad_agent_priv->agent.qp,
> -  send_wr->wr.wr_id, drslid,
> +  send_wr->wr.wr_cqe, drslid,
>send_wr->pkey_index,
>send_wr->port_num, _wc);
>  
> @@ -1051,7 +1042,9 @@ struct ib_mad_send_buf * ib_create_send_mad(struct 
> ib_mad_agent *mad_agent,
>  
>   mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey;
>  
> - mad_send_wr->send_wr.wr.wr_id = (unsigned long) mad_send_wr;
> + mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
> +
> + mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
>   mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list;
>   mad_send_wr->send_wr.wr.num_sge = 2;
>   mad_send_wr->send_wr.wr.opcode = IB_WR_SEND;
> @@ -1163,8 +1156,9 @@ int ib_send_mad(struct ib_mad_send_wr_private 
> *mad_send_wr)
>  
>   /* Set WR ID to find mad_send_wr upon completion */
>   qp_info = mad_send_wr->mad_agent_priv->qp_info;
> - mad_send_wr->send_wr.wr.wr_id = (unsigned long)_send_wr->mad_list;
>   mad_send_wr->mad_list.mad_queue = _info->send_queue;
> + mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
> + mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
>  
>   mad_agent = mad_send_wr->send_buf.mad_agent;
>   sge = mad_send_wr->sg_list;
> @@ -2185,13 +2179,14 @@ handle_smi(struct ib_mad_port_private *port_priv,
>   return handle_ib_smi(port_priv, qp_info, wc, port_num, recv, response);
>  }
>  
> -static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
> -  struct ib_wc *wc)
> +static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
>  {
> + struct ib_mad_port_private *port_priv = cq->cq_context;
> + struct ib_mad_list_head *mad_list =
> + container_of(wc->wr_cqe, struct ib_mad_list_head, cqe);
>   struct ib_mad_qp_info *qp_info;
>   struct ib_mad_private_header *mad_priv_hdr;
>   struct ib_mad_private *recv, *response = NULL;
> - struct ib_mad_list_head *mad_list;
>   struct ib_mad_agent_private *mad_agent;
>   

Re: [PATCH 1/3] IB Core: Create get_perf_mad function in sysfs.c

2016-01-03 Thread ira.weiny
On Thu, Dec 31, 2015 at 08:58:31AM +0100, Bart Van Assche wrote:
> On 12/17/2015 08:52 PM, Christoph Lameter wrote:
> >-in_mad->mad_hdr.attr_id   = cpu_to_be16(0x12); /* PortCounters */
> >+in_mad->mad_hdr.attr_id   = attr;
> 
> Hello Christoph,
> 
> sparse reports an endianness mismatch for this and similar assignments 
> (make M=drivers/infiniband/core C=2 CF=-D__CHECK_ENDIAN__). Can you have 
> a look at this ?

quick patch sent:

https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg31257.html

Ira

> 
> Thanks,
> 
> Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2016-01-03 Thread ira.weiny
On Sat, Jan 02, 2016 at 09:03:31AM -0800, Christoph Hellwig wrote:
> On Wed, Dec 30, 2015 at 09:00:07PM -0500, ira.weiny wrote:
> > On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> > > Hi Ira,
> > > 
> > > please take a look at the patches I've attached - they are just WIP
> > > that hasn't been tested as I'm on a vacation without access to my
> > > IB setup until New Year's Eve.
> > 
> > I have them on a branch.
> > 
> > I'll try and do some testing over the weekend.
> 
> FYI, you probably need this fix from Bart:
> 
> http://marc.info/?l=linux-kernel=145138288102008=2

For some reason I did not...

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2016-01-03 Thread ira.weiny
On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> Hi Ira,
> 
> please take a look at the patches I've attached - they are just WIP
> that hasn't been tested as I'm on a vacation without access to my
> IB setup until New Year's Eve.
> 
> Patch 1 is I think a genuine bug fix caused by the madness (pun
> intendended) of the wr_id abuses.
> 
> Patch 2: passes the mad_send_buf explicitily to mad handlers to get rid
> of that mess.
> 
> Patch 3 is the CQ API conversion which becomes relatively simple once
> the prior issues above are sorted out.
> 

I've tested all 3 patches together and things seem to be working just fine.
That said I have some comments below.

> From a22609131ca353278015b6b4aec3077db06ad9f5 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:49:22 +0100
> Subject: IB/mad: pass send buf in wr_id in local_completions
> 
> The sa_query recv_handler expects the send_buf in wr_id, and all other recv
> handlers ignore wr_id.  It seems this is what we should pass, please confirm.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/mad.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index d4d2a61..e0859e5 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -2734,7 +2734,7 @@ static void local_completions(struct work_struct *work)
>* before request
>*/
>   build_smp_wc(recv_mad_agent->agent.qp,
> -  (unsigned long) local->mad_send_wr,
> +  (unsigned long) 
> >mad_send_wr->send_buf,

This is not right.

local_completions is only called for locally processed SMP packets (not SA
packets).  SMPs use the wr_id assigned by ib_create_send_mad (which points to
the internal struct ib_mad_send_wr_private object.)

It seems like a better fix (to make the code more explicit) would be:

20:08:58 > git di
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index e0859e515b47..2840f27a18a0 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2734,7 +2734,7 @@ static void local_completions(struct work_struct *work)
 * before request
 *  */
build_smp_wc(recv_mad_agent->agent.qp,
-(unsigned long) 
>mad_send_wr->send_buf,
+local->mad_send_wr->send_wr.wr.wr_id,
 be16_to_cpu(IB_LID_PERMISSIVE),
 local->mad_send_wr->send_wr.pkey_index,
 recv_mad_agent->agent.port_num, );

But this is just removed later anyway.


All that said I agree there is some hackyness (or as you said "abuse" in the
wr_id) in that the SA is depending on this code in ib_mad.

2013 mad_recv_wc->wc->wr_id = (unsigned long) _send_wr->send_buf;
2014 mad_agent_priv->agent.recv_handler(_agent_priv->agent,
2015mad_recv_wc);

Who is in control of wr_id is confusing in this area of the code.  For that
reason I like your next patch which makes this explicit throughout.


>be16_to_cpu(IB_LID_PERMISSIVE),
>local->mad_send_wr->send_wr.pkey_index,
>recv_mad_agent->agent.port_num, );
> -- 
> 1.9.1
> 

> From c6101bfa6543d0b35c2ca2fa0add09341f456e88 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:54:02 +0100
> Subject: IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
> 
> Stop abusing wr_id and just pass the parameter explicitly.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/cm.c  |  1 +
>  drivers/infiniband/core/mad.c | 18 ++
>  drivers/infiniband/core/sa_query.c|  7 ++-
>  drivers/infiniband/core/user_mad.c|  1 +
>  drivers/infiniband/ulp/srpt/ib_srpt.c |  1 +
>  include/rdma/ib_mad.h |  2 ++
>  6 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index e3a95d1..ad3726d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum 
> ib_event_type event)
>  EXPORT_SYMBOL(ib_cm_notify);
>  
>  static void cm_recv_handler(struct ib_mad_agent *mad_agent,
> + struct ib_mad_send_buf *send_buf,
>   struct ib_mad_recv_wc *mad_recv_wc)
>  {
>   struct cm_port *port = mad_agent->context;
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 

Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2016-01-03 Thread ira.weiny
On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> Hi Ira,
> 
> please take a look at the patches I've attached - they are just WIP
> that hasn't been tested as I'm on a vacation without access to my
> IB setup until New Year's Eve.
> 
> Patch 1 is I think a genuine bug fix caused by the madness (pun
> intendended) of the wr_id abuses.
> 
> Patch 2: passes the mad_send_buf explicitily to mad handlers to get rid
> of that mess.
> 
> Patch 3 is the CQ API conversion which becomes relatively simple once
> the prior issues above are sorted out.
> 

[snip]

> From 0a367aa36e5410f41333d613b7587913b57eb75f Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:55:32 +0100
> Subject: IB/mad: use CQ abstraction
> 
> Remove the local workqueue to process mad completions and use the CQ API
> instead.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/mad.c  | 154 
> +
>  drivers/infiniband/core/mad_priv.h |   2 +-
>  2 files changed, 55 insertions(+), 101 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index f15fcd6..b04ad05 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -61,18 +61,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in 
> number of work requests
>  module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
>  MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work 
> requests");
>  
> -/*
> - * Define a limit on the number of completions which will be processed by the
> - * worker thread in a single work item.  This ensures that other work items
> - * (potentially from other users) are processed fairly.
> - *
> - * The number of completions was derived from the default queue sizes above.
> - * We use a value which is double the larger of the 2 queues (receive @ 512)
> - * but keep it fixed such that an increase in that value does not introduce
> - * unfairness.
> - */
> -#define MAD_COMPLETION_PROC_LIMIT 1024
> -
>  static struct list_head ib_mad_port_list;
>  static u32 ib_mad_client_id = 0;
>  
> @@ -96,6 +84,9 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req 
> *mad_reg_req,
> u8 mgmt_class);
>  static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
>  struct ib_mad_agent_private *agent_priv);
> +static bool mad_error_handler(struct ib_mad_port_private *port_priv,
> +   struct ib_wc *wc);
> +static void ib_mad_send_done(struct ib_cq *cq, struct ib_wc *wc);
>  
>  /*
>   * Returns a ib_mad_port_private structure or NULL for a device/port
> @@ -702,11 +693,11 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
>  }
>  
>  static void build_smp_wc(struct ib_qp *qp,
> -  u64 wr_id, u16 slid, u16 pkey_index, u8 port_num,
> +  void *wr_cqe, u16 slid, u16 pkey_index, u8 port_num,
>struct ib_wc *wc)
>  {
>   memset(wc, 0, sizeof *wc);
> - wc->wr_id = wr_id;
> + wc->wr_cqe = wr_cqe;
>   wc->status = IB_WC_SUCCESS;
>   wc->opcode = IB_WC_RECV;
>   wc->pkey_index = pkey_index;
> @@ -844,7 +835,7 @@ static int handle_outgoing_dr_smp(struct 
> ib_mad_agent_private *mad_agent_priv,
>   }
>  
>   build_smp_wc(mad_agent_priv->agent.qp,
> -  send_wr->wr.wr_id, drslid,
> +  send_wr->wr.wr_cqe, drslid,
>send_wr->pkey_index,
>send_wr->port_num, _wc);
>  
> @@ -1051,7 +1042,9 @@ struct ib_mad_send_buf * ib_create_send_mad(struct 
> ib_mad_agent *mad_agent,
>  
>   mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey;
>  
> - mad_send_wr->send_wr.wr.wr_id = (unsigned long) mad_send_wr;
> + mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
> +
> + mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
>   mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list;
>   mad_send_wr->send_wr.wr.num_sge = 2;
>   mad_send_wr->send_wr.wr.opcode = IB_WR_SEND;
> @@ -1163,8 +1156,9 @@ int ib_send_mad(struct ib_mad_send_wr_private 
> *mad_send_wr)
>  
>   /* Set WR ID to find mad_send_wr upon completion */
>   qp_info = mad_send_wr->mad_agent_priv->qp_info;
> - mad_send_wr->send_wr.wr.wr_id = (unsigned long)_send_wr->mad_list;
>   mad_send_wr->mad_list.mad_queue = _info->send_queue;
> + mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
> + mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;

I'm not sure if it is absolutely required to assign cqe.done in both
ib_create_send_mad and ib_send_mad.  But I don't think it hurts either.

>  
>   mad_agent = mad_send_wr->send_buf.mad_agent;
>   sge = mad_send_wr->sg_list;
> @@ -2185,13 +2179,14 @@ handle_smi(struct ib_mad_port_private *port_priv,
>   return handle_ib_smi(port_priv, qp_info, wc, port_num, 

Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-30 Thread ira.weiny
On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> Hi Ira,
> 
> please take a look at the patches I've attached - they are just WIP
> that hasn't been tested as I'm on a vacation without access to my
> IB setup until New Year's Eve.

I have them on a branch.

I'll try and do some testing over the weekend.

Ira

> 
> Patch 1 is I think a genuine bug fix caused by the madness (pun
> intendended) of the wr_id abuses.
> 
> Patch 2: passes the mad_send_buf explicitily to mad handlers to get rid
> of that mess.
> 
> Patch 3 is the CQ API conversion which becomes relatively simple once
> the prior issues above are sorted out.
> 

> From a22609131ca353278015b6b4aec3077db06ad9f5 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:49:22 +0100
> Subject: IB/mad: pass send buf in wr_id in local_completions
> 
> The sa_query recv_handler expects the send_buf in wr_id, and all other recv
> handlers ignore wr_id.  It seems this is what we should pass, please confirm.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/mad.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index d4d2a61..e0859e5 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -2734,7 +2734,7 @@ static void local_completions(struct work_struct *work)
>* before request
>*/
>   build_smp_wc(recv_mad_agent->agent.qp,
> -  (unsigned long) local->mad_send_wr,
> +  (unsigned long) 
> >mad_send_wr->send_buf,
>be16_to_cpu(IB_LID_PERMISSIVE),
>local->mad_send_wr->send_wr.pkey_index,
>recv_mad_agent->agent.port_num, );
> -- 
> 1.9.1
> 

> From c6101bfa6543d0b35c2ca2fa0add09341f456e88 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 30 Dec 2015 11:54:02 +0100
> Subject: IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
> 
> Stop abusing wr_id and just pass the parameter explicitly.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/infiniband/core/cm.c  |  1 +
>  drivers/infiniband/core/mad.c | 18 ++
>  drivers/infiniband/core/sa_query.c|  7 ++-
>  drivers/infiniband/core/user_mad.c|  1 +
>  drivers/infiniband/ulp/srpt/ib_srpt.c |  1 +
>  include/rdma/ib_mad.h |  2 ++
>  6 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index e3a95d1..ad3726d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum 
> ib_event_type event)
>  EXPORT_SYMBOL(ib_cm_notify);
>  
>  static void cm_recv_handler(struct ib_mad_agent *mad_agent,
> + struct ib_mad_send_buf *send_buf,
>   struct ib_mad_recv_wc *mad_recv_wc)
>  {
>   struct cm_port *port = mad_agent->context;
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index e0859e5..f15fcd6 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
>  
>   atomic_inc(_snoop_priv->refcount);
>   spin_unlock_irqrestore(_info->snoop_lock, flags);
> - mad_snoop_priv->agent.recv_handler(_snoop_priv->agent,
> + mad_snoop_priv->agent.recv_handler(_snoop_priv->agent, NULL,
>  mad_recv_wc);
>   deref_snoop_agent(mad_snoop_priv);
>   spin_lock_irqsave(_info->snoop_lock, flags);
> @@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   /* user rmpp is in effect
>* and this is an active RMPP MAD
>*/
> - mad_recv_wc->wc->wr_id = 0;
> - 
> mad_agent_priv->agent.recv_handler(_agent_priv->agent,
> -mad_recv_wc);
> + mad_agent_priv->agent.recv_handler(
> + _agent_priv->agent, NULL,
> + mad_recv_wc);
>   atomic_dec(_agent_priv->refcount);
>   } else {
>   /* not user rmpp, revert to normal behavior and
> @@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct 
> ib_mad_agent_private *mad_agent_priv,
>   spin_unlock_irqrestore(_agent_priv->lock, 

Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-29 Thread ira.weiny
On Tue, Dec 29, 2015 at 11:51:19AM +0200, Sagi Grimberg wrote:
> 
> >Please just convert the mad handler to the new CQ API in
> >drivers/infiniband/core/cq.c.  If you have any question about it I'd be
> >glad to help you.
> 
> +1 on this suggestion.
> 
> We had these sorts of questions in our ULPs as well. The CQ API should
> take care of all that for you and leaves you to just handle the
> completions...

I saw your work and agree it would be nice but it will take some time to
convert and debug the MAD stack.  I'll try and find some time but it is
unlikely I will anytime soon.

We can hit this bug regularly with hfi1 but have not hit with qib or mlx4.  I
leave it up to Doug if he wants to take this fix before someone finds time to
convert the MAD stack.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-28 Thread ira.weiny
On Mon, Dec 28, 2015 at 06:51:30PM +0200, Eli Cohen wrote:
> On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.we...@intel.com wrote:
> > From: Dean Luick 
> > 
> >  
> > @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct 
> > work_struct *work)
> >  {
> > struct ib_mad_port_private *port_priv;
> > struct ib_wc wc;
> > +   int count = 0;
> >  
> > port_priv = container_of(work, struct ib_mad_port_private, work);
> > ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
> 
> I think you shoudld push the call to ib_req_notify_cq outside the
> while loop. You don't need to arm the CQ if you re-queued the work.
> Only when you have drained the CQ should you re-arm.

Will it hurt to rearm?  The way the code stands I think the worse that will
happen is an extra work item scheduled and an ib_poll_cq call.

I'm not quite sure what you mean about moving the ib_req_notify_cq outside of
the while loop.  It seems like to do what you say we would need another work
item which just does ib_poll_cq.  Is that what you meant?

Ira

> 
> > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
> > work_struct *work)
> > }
> > } else
> > mad_error_handler(port_priv, );
> > +
> > +   if (++count > MAD_COMPLETION_PROC_LIMIT) {
> > +   queue_work(port_priv->wq, _priv->work);
> > +   break;
> > +   }
> > }
> >  }
> >  
> > -- 
> > 1.8.2
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-28 Thread ira.weiny
On Tue, Dec 29, 2015 at 01:25:33AM +0200, Eli Cohen wrote:
> On Mon, Dec 28, 2015 at 06:05:46PM -0500, ira.weiny wrote:
> > 
> > Will it hurt to rearm?  The way the code stands I think the worse that will
> > happen is an extra work item scheduled and an ib_poll_cq call.
> 
> If you re-arm unconditionally you call for extra interrupts which you
> can do without. When you break the loop of processing completions since
> you exhausted the quota, you queue the work so you can continue
> processing completions in the next time slot of the work task. After
> queueing the work you should call "return" instead of "break".
> If you processed all the completions before reaching
> MAD_COMPLETION_PROC_LIMIT you will exit the while loop and then
> re-arming can take place.

I'm still confused.  Here is the code with the patch applied:


/* 
 * IB MAD completion callback 
 */ 
static void ib_mad_completion_handler(struct work_struct *work) 
{ 
struct ib_mad_port_private *port_priv; 
struct ib_wc wc; 
int count = 0; 

port_priv = container_of(work, struct ib_mad_port_private, work);
ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);

while (ib_poll_cq(port_priv->cq, 1, ) == 1) {
if (wc.status == IB_WC_SUCCESS) {
switch (wc.opcode) {
case IB_WC_SEND:
ib_mad_send_done_handler(port_priv, );
break;
case IB_WC_RECV:
ib_mad_recv_done_handler(port_priv, );
break;
default:
BUG_ON(1);
break;
}
} else
mad_error_handler(port_priv, );

if (++count > MAD_COMPLETION_PROC_LIMIT) {
queue_work(port_priv->wq, _priv->work);
break;
}
}
}


How is "return" any different than "break"?  Calling return will still result
in a rearm on the next work task.

Perhaps this code is wrong in the first place?  Should it call ib_req_notify_cq
after the while loop?  This code has been this way forever...

1da177e4c3f41   (Linus Torvalds 2005-04-16 15:20:36 -0700   2568) 
ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
1da177e4c3f41   (Linus Torvalds 2005-04-16 15:20:36 -0700   2569)
1da177e4c3f41   (Linus Torvalds 2005-04-16 15:20:36 -0700   2570)   while 
(ib_poll_cq(port_priv->cq, 1, ) == 1) {


Ira


> 
> > 
> > I'm not quite sure what you mean about moving the ib_req_notify_cq outside 
> > of
> > the while loop.  It seems like to do what you say we would need another work
> > item which just does ib_poll_cq.  Is that what you meant?
> > 
> > Ira
> > 
> > > 
> > > > @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
> > > > work_struct *work)
> > > > }
> > > > } else
> > > > mad_error_handler(port_priv, );
> > > > +
> > > > +   if (++count > MAD_COMPLETION_PROC_LIMIT) {
> > > > +   queue_work(port_priv->wq, _priv->work);
> > > > +   break;
> > > > +   }
> > > > }
> > > >  }
> > > >  
> > > > -- 
> > > > 1.8.2
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > > the body of a message to majord...@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling

2015-12-28 Thread ira.weiny
On Mon, Dec 28, 2015 at 04:53:18PM -0500, Hal Rosenstock wrote:
> 
> Port number is not part of ClassPortInfo attribute but is
> still needed as a parameter when invoking process_mad.
> 
> To properly handle this attribute, port_num is added as a
> parameter to get_counter_table and get_perf_mad was changed
> not to store port_num in the attribute itself when it's
> querying the ClassPortInfo attribute.
> 
> This handles issue pointed out by Matan Barak 
> 
> Signed-off-by: Hal Rosenstock 

Reviewed-by: Ira Weiny 

> Acked-by: Matan Barak 
> ---
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 539040f..2daf832 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -438,7 +438,8 @@ static int get_perf_mad(struct ib_device *dev, int 
> port_num, int attr,
>   in_mad->mad_hdr.method= IB_MGMT_METHOD_GET;
>   in_mad->mad_hdr.attr_id   = attr;
>  
> - in_mad->data[41] = port_num;/* PortSelect field */
> + if (attr != IB_PMA_CLASS_PORT_INFO)
> + in_mad->data[41] = port_num;/* PortSelect field */
>  
>   if ((dev->process_mad(dev, IB_MAD_IGNORE_MKEY,
>port_num, NULL, NULL,
> @@ -714,11 +715,12 @@ err:
>   * Figure out which counter table to use depending on
>   * the device capabilities.
>   */
> -static struct attribute_group *get_counter_table(struct ib_device *dev)
> +static struct attribute_group *get_counter_table(struct ib_device *dev,
> +  int port_num)
>  {
>   struct ib_class_port_info cpi;
>  
> - if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
> + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO,
>   , 40, sizeof(cpi)) >= 0) {
>  
>   if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH)
> @@ -776,7 +778,7 @@ static int add_port(struct ib_device *device, int 
> port_num,
>   goto err_put;
>   }
>  
> - p->pma_table = get_counter_table(device);
> + p->pma_table = get_counter_table(device, port_num);
>   ret = sysfs_create_group(>kobj, p->pma_table);
>   if (ret)
>   goto err_put_gid_attrs;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/15] staging/rdma/hfi1: Initial patches to add rdmavt support in HFI1

2015-12-24 Thread ira.weiny
On Tue, Dec 22, 2015 at 06:27:57PM -0800, gre...@linuxfoundation.org wrote:
> On Tue, Dec 22, 2015 at 02:15:08PM -0500, ira.weiny wrote:
> > On Mon, Dec 21, 2015 at 05:01:48PM -0800, gre...@linuxfoundation.org wrote:

[snip]

> > > 
> > > No, git is good :)
> > > 
> > > > How do we handle changes which affect both qib and hfi1?
> > > 
> > > I don't know, now this gets messy...
> > > 
> > 
> > Agreed and this is what we are worried about.
> > 
> > Can we do what Dan and Doug have proposed in the past and have Doug take 
> > over
> > the staging/rdma sub-tree?
> > 
> > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2015-November/081922.html
> > 
> > I think the upcoming merge window is a reasonable time for him to do that.
> 
> Ok, but keeping on top of all of the generic staging patches that come
> in is a tough thing to do, that's up to Doug, if he is ready for it...
> 

To help this process, once the change over happens, we will help to monitor
driverdev-devel for anything submitted to staging/rdma.  If something is
submitted which was not to Doug and linux-rdma we can handle alerting the
submitter to make sure it gets submitted to Doug as per the current MAINTAINERS
file.

Hope this helps,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Display extended counter set if available

2015-12-24 Thread ira.weiny
On Thu, Dec 24, 2015 at 06:22:14PM +0200, eran ben elisha wrote:
> On Mon, Dec 21, 2015 at 4:20 PM, Christoph Lameter  wrote:

[snip]

> >
> > +/*
> > + * Figure out which counter table to use depending on
> > + * the device capabilities.
> > + */
> > +static struct attribute_group *get_counter_table(struct ib_device *dev)
> > +{
> > +   struct ib_class_port_info cpi;
> > +
> > +   if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO,
> 
> Why 0, need to pass port num.
> See proposal Matan and myself sent.
> 

Passing a port num to a ClassPortInfo query makes no sense?

Your proposal sets a field which is wrong (I think it sets part of TrapGID if
my math is correct) in the ClassPortInfo query as per the spec.

I think the fix needs to be somewhere else.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-23 Thread ira.weiny
On Tue, Dec 22, 2015 at 02:38:10PM +0200, Moni Shoua wrote:
> Hi,
> 
> In the past months the need for a kernel module that implements the InfiniBand
> transport in software and unify all the InfiniBand software drivers has
> been raised. Since then, nobody has submitted any design proposal that satisfy
> the initial thoughts and can serve various back-ends.

I'm not sure I understand what you mean.  Denny has posted several high level
emails similar to this one and has asked for public feedback.  We have been
addressing all the feedback as it has come in and we continue to work toward
this common layer.

To that end we have been asking for you to identify any place you see an
incompatibility with the rdmavt layer for SoftRoCE.  Patches and comments to
rdmavt are encouraged and we have already incorporated patches from outside
Intel.

We are very sensitive to the fact that other drivers will want to use this
layer in the future.  But I don't think this layer is set in stone.  To
that end, any functionality which is specific to SoftRoCE should be added when
SoftRoCE is submitted to the kernel.  To date I have not seen any show stopper
objections except Sagi's comment on the lack of IB_WR_LOCAL_INV which we will
drop in the next submission.

Also, please be aware that we are being very careful with this work to not
sacrifice the performance of either qib or hfi1.  There are a couple of items
you mention below which seem to indicate you would like a more "pure"
separation of the driver.  I hope you understand that any work in this area
which affects our performance must be accounted for and may not result in as
"pure" a separation as you may like.  If that is a show stopper for SoftRoCE
lets work through the specific examples.

More inline.

> 
> The following is a RFC that presents a solution made of a single
> generic InfiniBand
> driver and many hardware specific back-end drivers. The RFC defines
> the requirements
> and the interfaces that have to be part of any generic InfiniBand driver.
> A generic InfiniBand driver that is not compliant with this RFC wouldn't be 
> able
> to serve different back-ends and therefore would miss its target.
> 
> 
> 
> A. Introduction
> 
> In Linux kernel, the responsibility to implement the InfiniBand protocol is
> roughly divided between 2 elements. The first are the core drivers which 
> expose
> an abstract verbs interface to the upper layer as well as interfaces to some
> common IB services like MAD, SA and CM. The second are vendor drivers and
> hardware which implement the abstract verbs interface.
> 
> A high level view of the model is in Figure A1
> 
>  +-+
>  | |
>  |IB core  |
>  |drivers  |
>  | |
>  +++
>   | Common
> 
>   | Vendor
>  +++
>  | |
>  |  Hardware   |
>  |  drivers|
>  | |
>  +++
>  | |
>  |  Hardware   |
>  | |
>  +-+
> 
> A1 - IB implementation model in Linux kernel
> 
> In the vendor part of the model, the devision of work between software and
> hardware is undefined and is usually one of the two below
> 
> - Context and logic are  managed in software. Hardware role is limited to
>   lower layer protocols (depending on the link layer) and maybe some offloads
> - Context and logic are managed in hardware while software role is to create
>   or destroy a context in the hardware and gets notified when hardware reports
>   about a completions tasks.
> 
> The following examples demonstrates the difference between the approaches 
> above.
> 
> - Send flow: application calls post_send() with QP and a WR. In the software
>   based approach the QP context is retrieved, the WR is parsed and a proper IB
>   packet is formed to be put in hardware buffers. In hardware based approach 
> the
>   driver puts the WR in hardware buffers with a handle to the QP and hardware
>   does the rest.

I'm glad you brought this up but I'm a bit confused.  For anything that sends
more than a single packet I think this is an oversimplification.[1]  Do you
perhaps mean WQE rather than packet?

If you are referring to a WQE we were thinking the same thing but have not
gotten the send code posted yet.

[1] For example a 1MB RDMA write will require many packets.


> - Receive flow: a data packet is received and put in hardware buffers (assume
>   that the destination is a RC QP). In software based approach the packet is
>   handed to the driver. The driver retrieves the context by QPN, 

Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-23 Thread ira.weiny
Doug,

With your email troubles I just wanted to make sure you did not lose track of
this patch.

https://patchwork.kernel.org/patch/7822511/

Thanks,
Ira

On Thu, Dec 10, 2015 at 04:52:30PM -0500, ira.we...@intel.com wrote:
> From: Dean Luick 
> 
> It was found that when a process was rapidly sending MADs other processes 
> could
> be hung in their unregister calls.
> 
> This would happen when process A was injecting packets fast enough that the
> single threaded workqueue was never exiting ib_mad_completion_handler.
> Therefore when process B called flush_workqueue via the unregister call it
> would hang until process A stopped sending MADs.
> 
> The fix is to periodically reschedule ib_mad_completion_handler after
> processing a large number of completions.  The number of completions chosen 
> was
> decided based on the defaults for the recv queue size.  However, it was kept
> fixed such that increasing those queue sizes would not adversely affect
> fairness in the future.
> 
> Reviewed-by: Ira Weiny 
> Signed-off-by: Dean Luick 
> ---
>  drivers/infiniband/core/mad.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 2281de122038..d4d2a618fd66 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -61,6 +61,18 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in 
> number of work requests
>  module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
>  MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work 
> requests");
>  
> +/*
> + * Define a limit on the number of completions which will be processed by the
> + * worker thread in a single work item.  This ensures that other work items
> + * (potentially from other users) are processed fairly.
> + *
> + * The number of completions was derived from the default queue sizes above.
> + * We use a value which is double the larger of the 2 queues (receive @ 512)
> + * but keep it fixed such that an increase in that value does not introduce
> + * unfairness.
> + */
> +#define MAD_COMPLETION_PROC_LIMIT 1024
> +
>  static struct list_head ib_mad_port_list;
>  static u32 ib_mad_client_id = 0;
>  
> @@ -2555,6 +2567,7 @@ static void ib_mad_completion_handler(struct 
> work_struct *work)
>  {
>   struct ib_mad_port_private *port_priv;
>   struct ib_wc wc;
> + int count = 0;
>  
>   port_priv = container_of(work, struct ib_mad_port_private, work);
>   ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP);
> @@ -2574,6 +2587,11 @@ static void ib_mad_completion_handler(struct 
> work_struct *work)
>   }
>   } else
>   mad_error_handler(port_priv, );
> +
> + if (++count > MAD_COMPLETION_PROC_LIMIT) {
> + queue_work(port_priv->wq, _priv->work);
> + break;
> + }
>   }
>  }
>  
> -- 
> 1.8.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] IB multicast cleanup patches V2

2015-12-23 Thread ira.weiny
On Wed, Dec 23, 2015 at 01:11:11PM -0500, Doug Ledford wrote:
> On 12/21/2015 09:42 AM, Christoph Lameter wrote:
> > V1->V2
> >  - Add Reviewed by's for first patch from Ira Weiny
> >  - Change name of ipoib_check_mcast_sendonly() to
> > ipoib_check_and_add_mcast_sendonly() as requested by Ira
> > 
> > This patchset cleans up the code a bit after the last round of multicast
> > patches related to the sendonly join logic. Some of the bits of code
> > landed in ipoib_main.c instead of ipoib_multicast.c.
> > 
> > - Move the multicastbits into that file so that everything is neatly 
> > together
> > - Reduce the number of functions exported from ipoib_multicast.c
> > 
> > This patchset can be retrieved from a git repo on kernel.org via
> > 
> > git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/rdma.git 
> > cleanup
> > 
> 
> Thanks, applied (with Ira's name fixed).

Thanks!  :-D

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/15] staging/rdma/hfi1: Initial patches to add rdmavt support in HFI1

2015-12-22 Thread ira.weiny
On Mon, Dec 21, 2015 at 05:01:48PM -0800, gre...@linuxfoundation.org wrote:
> On Mon, Dec 21, 2015 at 07:19:43PM -0500, ira.weiny wrote:
> > On Mon, Dec 21, 2015 at 02:02:35PM -0800, gre...@linuxfoundation.org wrote:
> > > On Mon, Dec 21, 2015 at 01:12:14AM -0500, ira.weiny wrote:
> > > > Greg, Doug,
> > > > 
> > > > As mentioned below, these patches depend on the new rdmavt library 
> > > > submitted to
> > > > Doug on linux-rdma.
> > > > 
> > > > We continue to identify (and rework) patches by our other developers 
> > > > which can
> > > > be submitted without conflicts with this series.  Furthermore, We have, 
> > > > as much
> > > > as possible, placed fixes directly into rdmavt such that those changes 
> > > > can be
> > > > dropped from hfi1.  But at this point, we need to know if and where 
> > > > these are
> > > > going to land so that we can start reworking as appropriate.
> > > > 
> > > > Therefore, I would like to discuss plans to get hfi1 under the same 
> > > > maintainer
> > > > to work through this transitional period.
> > > > 
> > > > Basically, At what point should we stop submitting patches to Greg and 
> > > > start
> > > > submitting to Doug?
> > > > 
> > > > Should we consider the merge window itself as the swap over point and 
> > > > submit
> > > > changes to Doug at that point?  If so, should we continue to submit 
> > > > what we can
> > > > to Greg until then (and continue rebase'ing the series below on that 
> > > > work)?  Or
> > > > given Gregs backlog, should we stop submitting to Greg sometime prior 
> > > > to the
> > > > merge window?
> > > > 
> > > > That brings up my final question, at the point of swap over I assume 
> > > > anything
> > > > not accepted by Greg should be considered rejected and we need to 
> > > > resubmit to
> > > > Doug?
> > > 
> > > If Doug accepts the library changes, let me know that public git commit
> > > and I can pull it into the staging-next branch and you can continue to
> > > send me staging patches that way.
> > 
> > Won't this cause a conflict during the merge window?
> 
> No, git is good :)
> 
> > How do we handle changes which affect both qib and hfi1?
> 
> I don't know, now this gets messy...
> 

Agreed and this is what we are worried about.

Can we do what Dan and Doug have proposed in the past and have Doug take over
the staging/rdma sub-tree?

http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2015-November/081922.html

I think the upcoming merge window is a reasonable time for him to do that.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Display extended counter set if available

2015-12-21 Thread ira.weiny
On Mon, Dec 21, 2015 at 01:31:31PM -0600, Christoph Lameter wrote:
> On Mon, 21 Dec 2015, Hal Rosenstock wrote:
> 
> > > Don't we need to change all the sysfs_remove_groups to use 
> > > get_counter_table as
> > > well?
> >
> > Looks like it to me too. Good catch.
> 
> Fix follows:
> 
> From: Christoph Lameter 
> Subject: Fix sysfs entry removal by storing the table format in  pma_table
> 
> Store the table being used in the ib_port structure and use it when sysfs
> entries have to be removed.
> 
> Signed-off-by: Christoph Lameter 

Reviewed-by: Ira Weiny 

> 
> Index: linux/drivers/infiniband/core/sysfs.c
> ===
> --- linux.orig/drivers/infiniband/core/sysfs.c
> +++ linux/drivers/infiniband/core/sysfs.c
> @@ -47,6 +47,7 @@ struct ib_port {
>   struct attribute_group gid_group;
>   struct attribute_group pkey_group;
>   u8 port_num;
> + struct attribute_group *pma_table;
>  };
> 
>  struct port_attribute {
> @@ -651,7 +652,8 @@ static int add_port(struct ib_device *de
>   return ret;
>   }
> 
> - ret = sysfs_create_group(>kobj, get_counter_table(device));
> + p->pma_table = get_counter_table(device);
> + ret = sysfs_create_group(>kobj, p->pma_table);
>   if (ret)
>   goto err_put;
> 
> @@ -710,7 +712,7 @@ err_free_gid:
>   p->gid_group.attrs = NULL;
> 
>  err_remove_pma:
> - sysfs_remove_group(>kobj, _group);
> + sysfs_remove_group(>kobj, p->pma_table);
> 
>  err_put:
>   kobject_put(>kobj);
> @@ -923,7 +925,7 @@ static void free_port_list_attributes(st
>   list_for_each_entry_safe(p, t, >port_list, entry) {
>   struct ib_port *port = container_of(p, struct ib_port, kobj);
>   list_del(>entry);
> - sysfs_remove_group(p, _group);
> + sysfs_remove_group(p, port->pma_table);
>   sysfs_remove_group(p, >pkey_group);
>   sysfs_remove_group(p, >gid_group);
>   kobject_put(p);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/15] staging/rdma/hfi1: Initial patches to add rdmavt support in HFI1

2015-12-21 Thread ira.weiny
On Mon, Dec 21, 2015 at 02:02:35PM -0800, gre...@linuxfoundation.org wrote:
> On Mon, Dec 21, 2015 at 01:12:14AM -0500, ira.weiny wrote:
> > Greg, Doug,
> > 
> > As mentioned below, these patches depend on the new rdmavt library 
> > submitted to
> > Doug on linux-rdma.
> > 
> > We continue to identify (and rework) patches by our other developers which 
> > can
> > be submitted without conflicts with this series.  Furthermore, We have, as 
> > much
> > as possible, placed fixes directly into rdmavt such that those changes can 
> > be
> > dropped from hfi1.  But at this point, we need to know if and where these 
> > are
> > going to land so that we can start reworking as appropriate.
> > 
> > Therefore, I would like to discuss plans to get hfi1 under the same 
> > maintainer
> > to work through this transitional period.
> > 
> > Basically, At what point should we stop submitting patches to Greg and start
> > submitting to Doug?
> > 
> > Should we consider the merge window itself as the swap over point and submit
> > changes to Doug at that point?  If so, should we continue to submit what we 
> > can
> > to Greg until then (and continue rebase'ing the series below on that work)? 
> >  Or
> > given Gregs backlog, should we stop submitting to Greg sometime prior to the
> > merge window?
> > 
> > That brings up my final question, at the point of swap over I assume 
> > anything
> > not accepted by Greg should be considered rejected and we need to resubmit 
> > to
> > Doug?
> 
> If Doug accepts the library changes, let me know that public git commit
> and I can pull it into the staging-next branch and you can continue to
> send me staging patches that way.

Won't this cause a conflict during the merge window?

How do we handle changes which affect both qib and hfi1?

Ira

> 
> That's the easiest thing to do usually.
> 
> thanks,
> 
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Clean up SDMA engine code

2015-12-21 Thread ira.weiny
On Mon, Dec 21, 2015 at 04:13:49PM -0800, gre...@linuxfoundation.org wrote:
> On Mon, Dec 21, 2015 at 06:48:03PM -0500, ira.weiny wrote:
> > On Tue, Dec 08, 2015 at 05:10:08PM -0500, ira.we...@intel.com wrote:
> > > From: Ira Weiny <ira.we...@intel.com>
> > > 
> > > Various improvements to the SDMA engine code.
> > 
> > Greg,
> > 
> > Thanks for reviewing and accepting our patches to staging-testing.  I 
> > apologize
> > for the conflicts we had between the 3 of us submitting.  However, in
> > attempting to rework an internal branch to ensure this does not happen 
> > again I
> > believe there were more conflicts than their should have been due to patches
> > being accepted out of order.
> > 
> > For example, I found the following error in your staging tree below.
> > 
> > This series you applied in the following order which causes a build failure 
> > on
> > the middle commit -- a0d4069.
> > 
> > 483119a staging/rdma/hfi1: Unconditionally clean-up SDMA queues
> > def8228 staging/rdma/hfi1: Convert to use get_user_pages_fast
> > a0d4069 staging/rdma/hfi1: Add page lock limit check for SDMA requests
> > faa98b8 staging/rdma/hfi1: Clean-up unnecessary goto statements
> > 6a5464f staging/rdma/hfi1: Detect SDMA transmission error early
> > 
> > The order as submitted was:
> > 
> > staging/rdma/hfi1: Convert to use get_user_pages_fast
> > staging/rdma/hfi1: Unconditionally clean-up SDMA queues
> > staging/rdma/hfi1: Clean-up unnecessary goto statements
> > staging/rdma/hfi1: Detect SDMA transmission error early
> > staging/rdma/hfi1: Add page lock limit check for SDMA requests
> > 
> > 
> > 
> > Do I need to resolve this somehow?  Or is this something you resolve while 
> > the
> > patches are in staging-testing?
> > 
> > Is there something we need to do in the cover letter of a patch series to
> > ensure order?  Perhaps my cover letter implied these were not ordered?  If 
> > so,
> > I again apologize.
> 
> Did you number your patches?

Yes, sent with git-send-email.

http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2015-December/thread.html#82509

> That's the only way to ensure that they
> are applied in the correct order, that's what I sort on to apply them.
> If you don't order them, I randomly guess, or just reject them...
> 
> All seems to build now, right?

Yes all builds now.  I just did not know if as part of testing an incremental
build check would then reject the patch.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Clean up SDMA engine code

2015-12-21 Thread ira.weiny
On Tue, Dec 08, 2015 at 05:10:08PM -0500, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> Various improvements to the SDMA engine code.

Greg,

Thanks for reviewing and accepting our patches to staging-testing.  I apologize
for the conflicts we had between the 3 of us submitting.  However, in
attempting to rework an internal branch to ensure this does not happen again I
believe there were more conflicts than their should have been due to patches
being accepted out of order.

For example, I found the following error in your staging tree below.

This series you applied in the following order which causes a build failure on
the middle commit -- a0d4069.

483119a staging/rdma/hfi1: Unconditionally clean-up SDMA queues
def8228 staging/rdma/hfi1: Convert to use get_user_pages_fast
a0d4069 staging/rdma/hfi1: Add page lock limit check for SDMA requests
faa98b8 staging/rdma/hfi1: Clean-up unnecessary goto statements
6a5464f staging/rdma/hfi1: Detect SDMA transmission error early

The order as submitted was:

staging/rdma/hfi1: Convert to use get_user_pages_fast
staging/rdma/hfi1: Unconditionally clean-up SDMA queues
staging/rdma/hfi1: Clean-up unnecessary goto statements
staging/rdma/hfi1: Detect SDMA transmission error early
staging/rdma/hfi1: Add page lock limit check for SDMA requests



Do I need to resolve this somehow?  Or is this something you resolve while the
patches are in staging-testing?

Is there something we need to do in the cover letter of a patch series to
ensure order?  Perhaps my cover letter implied these were not ordered?  If so,
I again apologize.

Thanks,
Ira

> 
> ---
> Changes from V1:
>   Fix off by one error in the last patch
> 
> Mitko Haralanov (5):
>   staging/rdma/hfi1: Convert to use get_user_pages_fast
>   staging/rdma/hfi1: Unconditionally clean-up SDMA queues
>   staging/rdma/hfi1: Clean-up unnecessary goto statements
>   staging/rdma/hfi1: Detect SDMA transmission error early
>   staging/rdma/hfi1: Add page lock limit check for SDMA requests
> 
>  drivers/staging/rdma/hfi1/file_ops.c   |  11 +-
>  drivers/staging/rdma/hfi1/hfi.h|   4 +-
>  drivers/staging/rdma/hfi1/user_pages.c |  97 +++---
>  drivers/staging/rdma/hfi1/user_sdma.c  | 319 
> +++--
>  drivers/staging/rdma/hfi1/user_sdma.h  |   2 +
>  5 files changed, 222 insertions(+), 211 deletions(-)
> 
> -- 
> 1.8.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Display extended counter set if available

2015-12-21 Thread ira.weiny
On Mon, Dec 21, 2015 at 08:20:29AM -0600, Christoph Lameter wrote:
> V2->V3: Add check for NOIETF mode and create special table
>   for that case.
> 
> Check if the extended counters are available and if so
> create the proper extended and additional counters.
> 
> Reviewed-by: Hal Rosenstock 
> Signed-off-by: Christoph Lameter 
> ---
>  drivers/infiniband/core/sysfs.c | 104 
> +++-
>  include/rdma/ib_pma.h   |   1 +
>  2 files changed, 104 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 34dcc23..b179fca 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -320,6 +320,13 @@ struct port_table_attribute port_pma_attr_##_name = {
> \
>   .attr_id = IB_PMA_PORT_COUNTERS ,   \
>  }
>  
> +#define PORT_PMA_ATTR_EXT(_name, _width, _offset)\
> +struct port_table_attribute port_pma_attr_ext_##_name = {\
> + .attr  = __ATTR(_name, S_IRUGO, show_pma_counter, NULL),\
> + .index = (_offset) | ((_width) << 16),  \
> + .attr_id = IB_PMA_PORT_COUNTERS_EXT ,   \
> +}
> +
>  /*
>   * Get a Perfmgmt MAD block of data.
>   * Returns error code or the number of bytes retrieved.
> @@ -400,6 +407,11 @@ static ssize_t show_pma_counter(struct ib_port *p, 
> struct port_attribute *attr,
>   ret = sprintf(buf, "%u\n",
> be32_to_cpup((__be32 *)data));
>   break;
> + case 64:
> + ret = sprintf(buf, "%llu\n",
> + be64_to_cpup((__be64 *)data));
> + break;
> +
>   default:
>   ret = 0;
>   }
> @@ -424,6 +436,18 @@ static PORT_PMA_ATTR(port_rcv_data   , 
> 13, 32, 224);
>  static PORT_PMA_ATTR(port_xmit_packets   , 14, 32, 256);
>  static PORT_PMA_ATTR(port_rcv_packets, 15, 32, 288);
>  
> +/*
> + * Counters added by extended set
> + */
> +static PORT_PMA_ATTR_EXT(port_xmit_data  , 64,  64);
> +static PORT_PMA_ATTR_EXT(port_rcv_data   , 64, 128);
> +static PORT_PMA_ATTR_EXT(port_xmit_packets   , 64, 192);
> +static PORT_PMA_ATTR_EXT(port_rcv_packets, 64, 256);
> +static PORT_PMA_ATTR_EXT(unicast_xmit_packets, 64, 320);
> +static PORT_PMA_ATTR_EXT(unicast_rcv_packets , 64, 384);
> +static PORT_PMA_ATTR_EXT(multicast_xmit_packets  , 64, 448);
> +static PORT_PMA_ATTR_EXT(multicast_rcv_packets   , 64, 512);
> +
>  static struct attribute *pma_attrs[] = {
>   _pma_attr_symbol_error.attr.attr,
>   _pma_attr_link_error_recovery.attr.attr,
> @@ -444,11 +468,65 @@ static struct attribute *pma_attrs[] = {
>   NULL
>  };
>  
> +static struct attribute *pma_attrs_ext[] = {
> + _pma_attr_symbol_error.attr.attr,
> + _pma_attr_link_error_recovery.attr.attr,
> + _pma_attr_link_downed.attr.attr,
> + _pma_attr_port_rcv_errors.attr.attr,
> + _pma_attr_port_rcv_remote_physical_errors.attr.attr,
> + _pma_attr_port_rcv_switch_relay_errors.attr.attr,
> + _pma_attr_port_xmit_discards.attr.attr,
> + _pma_attr_port_xmit_constraint_errors.attr.attr,
> + _pma_attr_port_rcv_constraint_errors.attr.attr,
> + _pma_attr_local_link_integrity_errors.attr.attr,
> + _pma_attr_excessive_buffer_overrun_errors.attr.attr,
> + _pma_attr_VL15_dropped.attr.attr,
> + _pma_attr_ext_port_xmit_data.attr.attr,
> + _pma_attr_ext_port_rcv_data.attr.attr,
> + _pma_attr_ext_port_xmit_packets.attr.attr,
> + _pma_attr_ext_port_rcv_packets.attr.attr,
> + _pma_attr_ext_unicast_rcv_packets.attr.attr,
> + _pma_attr_ext_unicast_xmit_packets.attr.attr,
> + _pma_attr_ext_multicast_rcv_packets.attr.attr,
> + _pma_attr_ext_multicast_xmit_packets.attr.attr,
> + NULL
> +};
> +
> +static struct attribute *pma_attrs_noietf[] = {
> + _pma_attr_symbol_error.attr.attr,
> + _pma_attr_link_error_recovery.attr.attr,
> + _pma_attr_link_downed.attr.attr,
> + _pma_attr_port_rcv_errors.attr.attr,
> + _pma_attr_port_rcv_remote_physical_errors.attr.attr,
> + _pma_attr_port_rcv_switch_relay_errors.attr.attr,
> + _pma_attr_port_xmit_discards.attr.attr,
> + _pma_attr_port_xmit_constraint_errors.attr.attr,
> + _pma_attr_port_rcv_constraint_errors.attr.attr,
> + _pma_attr_local_link_integrity_errors.attr.attr,
> + _pma_attr_excessive_buffer_overrun_errors.attr.attr,
> + _pma_attr_VL15_dropped.attr.attr,
> + _pma_attr_ext_port_xmit_data.attr.attr,
> + _pma_attr_ext_port_rcv_data.attr.attr,
> + _pma_attr_ext_port_xmit_packets.attr.attr,
> + _pma_attr_ext_port_rcv_packets.attr.attr,
> + NULL
> +};
> +
>  static struct attribute_group pma_group = {
>   .name  = "counters",

Re: [PATCH V1 1/3] IB/core: Align coding style of ib_device_cap_flags structure

2015-12-21 Thread ira.weiny
On Mon, Dec 21, 2015 at 12:03:46AM -0800, Christoph Hellwig wrote:
> On Mon, Dec 21, 2015 at 08:37:26AM +0200, Leon Romanovsky wrote:
> > You are right and it is a preferred way for me too, however the
> > downside of such change will be one of two:
> > 1. Change this structure only => we will have style mix of BITs and
> > shifts in the same file. IMHO it looks awful.
> > 2. Change the whole file => the work with "git blame" will be less
> > straightforward.
> 
> Honestly, the BIT macros are horribly, and anyone who thinks it's useful
> really should read a book on computer architectured and one on C.

It would be nice if we were not having to do this for staging then.  Also
perhaps it should be removed from checkpatch --strict?

I'm not a big fan of everything checkpatch does, this being one of them, but
Leon was trying to do the right thing here.

Where are the guidelines for when one can ignore checkpatch and when they can
not?  It would be nice to know when we can "be developers" vs "being robots to
some tool".

I await Dougs guidance.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Create get_perf_mad function in sysfs.c

2015-12-21 Thread ira.weiny
On Mon, Dec 21, 2015 at 08:20:27AM -0600, Christoph Lameter wrote:
> Create a new function to retrieve performance management
> data from the existing code in get_pma_counter().
> 
> Reviewed-by: Hal Rosenstock 

Reviewed-by: Ira Weiny 

> Signed-off-by: Christoph Lameter 
> ---
>  drivers/infiniband/core/sysfs.c | 62 
> ++---
>  1 file changed, 40 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index b1f37d4..acefe85 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -317,21 +317,21 @@ struct port_table_attribute port_pma_attr_##_name = {   
> \
>   .index = (_offset) | ((_width) << 16) | ((_counter) << 24)  \
>  }
>  
> -static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute 
> *attr,
> - char *buf)
> +/*
> + * Get a Perfmgmt MAD block of data.
> + * Returns error code or the number of bytes retrieved.
> + */
> +static int get_perf_mad(struct ib_device *dev, int port_num, int attr,
> + void *data, int offset, size_t size)
>  {
> - struct port_table_attribute *tab_attr =
> - container_of(attr, struct port_table_attribute, attr);
> - int offset = tab_attr->index & 0x;
> - int width  = (tab_attr->index >> 16) & 0xff;
> - struct ib_mad *in_mad  = NULL;
> - struct ib_mad *out_mad = NULL;
> + struct ib_mad *in_mad;
> + struct ib_mad *out_mad;
>   size_t mad_size = sizeof(*out_mad);
>   u16 out_mad_pkey_index = 0;
>   ssize_t ret;
>  
> - if (!p->ibdev->process_mad)
> - return sprintf(buf, "N/A (no PMA)\n");
> + if (!dev->process_mad)
> + return -ENOSYS;
>  
>   in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
>   out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
> @@ -344,12 +344,12 @@ static ssize_t show_pma_counter(struct ib_port *p, 
> struct port_attribute *attr,
>   in_mad->mad_hdr.mgmt_class= IB_MGMT_CLASS_PERF_MGMT;
>   in_mad->mad_hdr.class_version = 1;
>   in_mad->mad_hdr.method= IB_MGMT_METHOD_GET;
> - in_mad->mad_hdr.attr_id   = cpu_to_be16(0x12); /* PortCounters */
> + in_mad->mad_hdr.attr_id   = attr;
>  
> - in_mad->data[41] = p->port_num; /* PortSelect field */
> + in_mad->data[41] = port_num;/* PortSelect field */
>  
> - if ((p->ibdev->process_mad(p->ibdev, IB_MAD_IGNORE_MKEY,
> -  p->port_num, NULL, NULL,
> + if ((dev->process_mad(dev, IB_MAD_IGNORE_MKEY,
> +  port_num, NULL, NULL,
>(const struct ib_mad_hdr *)in_mad, mad_size,
>(struct ib_mad_hdr *)out_mad, _size,
>_mad_pkey_index) &
> @@ -358,31 +358,49 @@ static ssize_t show_pma_counter(struct ib_port *p, 
> struct port_attribute *attr,
>   ret = -EINVAL;
>   goto out;
>   }
> + memcpy(data, out_mad->data + offset, size);
> + ret = size;
> +out:
> + kfree(in_mad);
> + kfree(out_mad);
> + return ret;
> +}
> +
> +static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute 
> *attr,
> + char *buf)
> +{
> + struct port_table_attribute *tab_attr =
> + container_of(attr, struct port_table_attribute, attr);
> + int offset = tab_attr->index & 0x;
> + int width  = (tab_attr->index >> 16) & 0xff;
> + ssize_t ret;
> + u8 data[8];
> +
> + ret = get_perf_mad(p->ibdev, p->port_num, cpu_to_be16(0x12), ,
> + 40 + offset / 8, sizeof(data));
> + if (ret < 0)
> + return sprintf(buf, "N/A (no PMA)\n");
>  
>   switch (width) {
>   case 4:
> - ret = sprintf(buf, "%u\n", (out_mad->data[40 + offset / 8] >>
> + ret = sprintf(buf, "%u\n", (*data >>
>   (4 - (offset % 8))) & 0xf);
>   break;
>   case 8:
> - ret = sprintf(buf, "%u\n", out_mad->data[40 + offset / 8]);
> + ret = sprintf(buf, "%u\n", *data);
>   break;
>   case 16:
>   ret = sprintf(buf, "%u\n",
> -   be16_to_cpup((__be16 *)(out_mad->data + 40 + 
> offset / 8)));
> +   be16_to_cpup((__be16 *)data));
>   break;
>   case 32:
>   ret = sprintf(buf, "%u\n",
> -   be32_to_cpup((__be32 *)(out_mad->data + 40 + 
> offset / 8)));
> +   be32_to_cpup((__be32 *)data));
>   break;
>   default:
>   ret = 0;
>   }
>  
> -out:
> - kfree(in_mad);
> - kfree(out_mad);
> -
>   return ret;
>  }
>  
> -- 
> 2.5.0
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: [PATCH 2/3] Specify attribute_id in port_table_attribute

2015-12-21 Thread ira.weiny
On Mon, Dec 21, 2015 at 08:20:28AM -0600, Christoph Lameter wrote:
> Add the attr_id on port_table_attribute since we will have to add
> a different port_table_attribute for the extended attribute soon.
> 
> Reviewed-by: Hal Rosenstock 

Reviewed-by: Ira Weiny 

> Signed-off-by: Christoph Lameter 
> ---
>  drivers/infiniband/core/sysfs.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index acefe85..34dcc23 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -39,6 +39,7 @@
>  #include 
>  
>  #include 
> +#include 
>  
>  struct ib_port {
>   struct kobject kobj;
> @@ -65,6 +66,7 @@ struct port_table_attribute {
>   struct port_attribute   attr;
>   charname[8];
>   int index;
> + int attr_id;
>  };
>  
>  static ssize_t port_attr_show(struct kobject *kobj,
> @@ -314,7 +316,8 @@ static ssize_t show_port_pkey(struct ib_port *p, struct 
> port_attribute *attr,
>  #define PORT_PMA_ATTR(_name, _counter, _width, _offset)  
> \
>  struct port_table_attribute port_pma_attr_##_name = {
> \
>   .attr  = __ATTR(_name, S_IRUGO, show_pma_counter, NULL),\
> - .index = (_offset) | ((_width) << 16) | ((_counter) << 24)  \
> + .index = (_offset) | ((_width) << 16) | ((_counter) << 24), \
> + .attr_id = IB_PMA_PORT_COUNTERS ,   \
>  }
>  
>  /*
> @@ -376,7 +379,7 @@ static ssize_t show_pma_counter(struct ib_port *p, struct 
> port_attribute *attr,
>   ssize_t ret;
>   u8 data[8];
>  
> - ret = get_perf_mad(p->ibdev, p->port_num, cpu_to_be16(0x12), ,
> + ret = get_perf_mad(p->ibdev, p->port_num, tab_attr->attr_id, ,
>   40 + offset / 8, sizeof(data));
>   if (ret < 0)
>   return sprintf(buf, "N/A (no PMA)\n");
> -- 
> 2.5.0
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/15] staging/rdma/hfi1: Initial patches to add rdmavt support in HFI1

2015-12-20 Thread ira.weiny
Greg, Doug,

As mentioned below, these patches depend on the new rdmavt library submitted to
Doug on linux-rdma.

We continue to identify (and rework) patches by our other developers which can
be submitted without conflicts with this series.  Furthermore, We have, as much
as possible, placed fixes directly into rdmavt such that those changes can be
dropped from hfi1.  But at this point, we need to know if and where these are
going to land so that we can start reworking as appropriate.

Therefore, I would like to discuss plans to get hfi1 under the same maintainer
to work through this transitional period.

Basically, At what point should we stop submitting patches to Greg and start
submitting to Doug?

Should we consider the merge window itself as the swap over point and submit
changes to Doug at that point?  If so, should we continue to submit what we can
to Greg until then (and continue rebase'ing the series below on that work)?  Or
given Gregs backlog, should we stop submitting to Greg sometime prior to the
merge window?

That brings up my final question, at the point of swap over I assume anything
not accepted by Greg should be considered rejected and we need to resubmit to
Doug?

Thanks in advance for any guidance,
Ira


On Mon, Dec 14, 2015 at 12:27:49PM -0500, Dalessandro, Dennis wrote:
> This patch series is being submitted as a Request For Comment only. It depends
> on code submitted to the rdma subsystem [1].
> 
> This work is the first submission aimed to satisfy the TODO item for removing
> duplicated code in hfi1. At the time of submission hfi1 and qib contained alot
> of duplicated verbs processing code. The qib driver is having similar changes
> made to use rdmavt. This will result in a common code base that both drivers 
> and
> future drivers such as soft-roce can use.
> 
> Note that due to the ongoing submission of hfi1 improvement patches, there 
> will
> likely be a number of conflicts which will still need to be resolved.
> 
> We also are still faced with the issue of separate trees for this work as was
> discussed previously [2]. The result of that conversation was to keep the
> drivers in separate trees until the 4.5 merge window. We are hoping that after
> this merge window a single maintainer can take control of hfi1, qib, and 
> rdmavt
> so that these patches can move forward and be applied.
> 
> For now though we would like to get feedback on these patches with more to
> follow.
> 
> [1] https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg30074.html
> [2] https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg29360.html
> 
> ---
> 
> Dennis Dalessandro (15):
>   IB/hfi1: Begin to use rdmavt for verbs
>   IB/hfi1: Add basic rdmavt capability flags for hfi1
>   IB/hfi1: Consolidate dma ops for hfi1
>   IB/hfi1: Use rdmavt protection domain
>   IB/hfi1: Remove MR data structures from hfi1
>   IB/hfi1: Remove driver specific members from hfi1 qp type
>   IB/hfi1: Add device specific info prints
>   IB/hfi1: Use correct rdmavt header files after move.
>   IB/hfi1: Use address handle in rdmavt and remove from hfi1
>   IB/hfi1: Implement hfi1 support for AH notification
>   IB/hfi1: Remove hfi1 MR and hfi1 specific qp type
>   IB/hfi1: Remove srq from hfi1
>   IB/hfi1: Remove ibport and use rdmavt version
>   IB/hfi1: Remove mmap from hfi1
>   IB/hfi1: Use rdmavt pkey verbs function
> 
> 
>  drivers/staging/rdma/hfi1/Kconfig   |2 
>  drivers/staging/rdma/hfi1/Makefile  |4 
>  drivers/staging/rdma/hfi1/chip.c|   36 +-
>  drivers/staging/rdma/hfi1/common.h  |2 
>  drivers/staging/rdma/hfi1/cq.c  |   20 +
>  drivers/staging/rdma/hfi1/diag.c|   13 -
>  drivers/staging/rdma/hfi1/driver.c  |   31 +-
>  drivers/staging/rdma/hfi1/hfi.h |   27 +-
>  drivers/staging/rdma/hfi1/init.c|5 
>  drivers/staging/rdma/hfi1/intr.c|2 
>  drivers/staging/rdma/hfi1/keys.c|  356 -
>  drivers/staging/rdma/hfi1/mad.c |  163 +-
>  drivers/staging/rdma/hfi1/mmap.c|  192 ---
>  drivers/staging/rdma/hfi1/mr.c  |  522 --
>  drivers/staging/rdma/hfi1/pio.c |   10 -
>  drivers/staging/rdma/hfi1/qp.c  |  214 +++-
>  drivers/staging/rdma/hfi1/qp.h  |   44 +--
>  drivers/staging/rdma/hfi1/rc.c  |  155 +
>  drivers/staging/rdma/hfi1/ruc.c |  161 +
>  drivers/staging/rdma/hfi1/sdma.h|8 
>  drivers/staging/rdma/hfi1/srq.c |   58 ++-
>  drivers/staging/rdma/hfi1/sysfs.c   |   18 +
>  drivers/staging/rdma/hfi1/trace.h   |   22 +
>  drivers/staging/rdma/hfi1/uc.c  |   19 +
>  drivers/staging/rdma/hfi1/ud.c  |   91 +++--
>  drivers/staging/rdma/hfi1/verbs.c   |  526 
> +++
>  drivers/staging/rdma/hfi1/verbs.h   |  531 
> 

Re: [PATCH V1 1/3] IB/core: Align coding style of ib_device_cap_flags structure

2015-12-20 Thread ira.weiny
On Sun, Dec 20, 2015 at 12:16:09PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Modify enum ib_device_cap_flags such that other patches which add new
> enum values pass strict checkpatch.pl checks.
> 
> Reviewed-by: Sagi Grimberg 
> Signed-off-by: Leon Romanovsky 
> ---
>  include/rdma/ib_verbs.h | 60 
> -
>  1 file changed, 30 insertions(+), 30 deletions(-)
> 
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 9a68a19..bcf40ad 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -105,24 +105,24 @@ enum rdma_link_layer {
>  };
>  
>  enum ib_device_cap_flags {
> - IB_DEVICE_RESIZE_MAX_WR = 1,
> - IB_DEVICE_BAD_PKEY_CNTR = (1<<1),
> - IB_DEVICE_BAD_QKEY_CNTR = (1<<2),
> - IB_DEVICE_RAW_MULTI = (1<<3),
> - IB_DEVICE_AUTO_PATH_MIG = (1<<4),
> - IB_DEVICE_CHANGE_PHY_PORT   = (1<<5),
> - IB_DEVICE_UD_AV_PORT_ENFORCE= (1<<6),
> - IB_DEVICE_CURR_QP_STATE_MOD = (1<<7),
> - IB_DEVICE_SHUTDOWN_PORT = (1<<8),
> - IB_DEVICE_INIT_TYPE = (1<<9),
> - IB_DEVICE_PORT_ACTIVE_EVENT = (1<<10),
> - IB_DEVICE_SYS_IMAGE_GUID= (1<<11),
> - IB_DEVICE_RC_RNR_NAK_GEN= (1<<12),
> - IB_DEVICE_SRQ_RESIZE= (1<<13),
> - IB_DEVICE_N_NOTIFY_CQ   = (1<<14),
> - IB_DEVICE_LOCAL_DMA_LKEY= (1<<15),
> - IB_DEVICE_RESERVED  = (1<<16), /* old SEND_W_INV */
> - IB_DEVICE_MEM_WINDOW= (1<<17),
> + IB_DEVICE_RESIZE_MAX_WR = (1 << 0),

NIT: Shouldn't we just use the BIT macro?

IB_DEVICE_RESIZE_MAX_WR = BIT(0),

Ira

> + IB_DEVICE_BAD_PKEY_CNTR = (1 << 1),
> + IB_DEVICE_BAD_QKEY_CNTR = (1 << 2),
> + IB_DEVICE_RAW_MULTI = (1 << 3),
> + IB_DEVICE_AUTO_PATH_MIG = (1 << 4),
> + IB_DEVICE_CHANGE_PHY_PORT   = (1 << 5),
> + IB_DEVICE_UD_AV_PORT_ENFORCE= (1 << 6),
> + IB_DEVICE_CURR_QP_STATE_MOD = (1 << 7),
> + IB_DEVICE_SHUTDOWN_PORT = (1 << 8),
> + IB_DEVICE_INIT_TYPE = (1 << 9),
> + IB_DEVICE_PORT_ACTIVE_EVENT = (1 << 10),
> + IB_DEVICE_SYS_IMAGE_GUID= (1 << 11),
> + IB_DEVICE_RC_RNR_NAK_GEN= (1 << 12),
> + IB_DEVICE_SRQ_RESIZE= (1 << 13),
> + IB_DEVICE_N_NOTIFY_CQ   = (1 << 14),
> + IB_DEVICE_LOCAL_DMA_LKEY= (1 << 15),
> + IB_DEVICE_RESERVED  = (1 << 16), /* old SEND_W_INV */
> + IB_DEVICE_MEM_WINDOW= (1 << 17),
>   /*
>* Devices should set IB_DEVICE_UD_IP_SUM if they support
>* insertion of UDP and TCP checksum on outgoing UD IPoIB
> @@ -130,18 +130,18 @@ enum ib_device_cap_flags {
>* incoming messages.  Setting this flag implies that the
>* IPoIB driver may set NETIF_F_IP_CSUM for datagram mode.
>*/
> - IB_DEVICE_UD_IP_CSUM= (1<<18),
> - IB_DEVICE_UD_TSO= (1<<19),
> - IB_DEVICE_XRC   = (1<<20),
> - IB_DEVICE_MEM_MGT_EXTENSIONS= (1<<21),
> - IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (1<<22),
> - IB_DEVICE_MEM_WINDOW_TYPE_2A= (1<<23),
> - IB_DEVICE_MEM_WINDOW_TYPE_2B= (1<<24),
> - IB_DEVICE_RC_IP_CSUM= (1<<25),
> - IB_DEVICE_RAW_IP_CSUM   = (1<<26),
> - IB_DEVICE_MANAGED_FLOW_STEERING = (1<<29),
> - IB_DEVICE_SIGNATURE_HANDOVER= (1<<30),
> - IB_DEVICE_ON_DEMAND_PAGING  = (1<<31),
> + IB_DEVICE_UD_IP_CSUM= (1 << 18),
> + IB_DEVICE_UD_TSO= (1 << 19),
> + IB_DEVICE_XRC   = (1 << 20),
> + IB_DEVICE_MEM_MGT_EXTENSIONS= (1 << 21),
> + IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (1 << 22),
> + IB_DEVICE_MEM_WINDOW_TYPE_2A= (1 << 23),
> + IB_DEVICE_MEM_WINDOW_TYPE_2B= (1 << 24),
> + IB_DEVICE_RC_IP_CSUM= (1 << 25),
> + IB_DEVICE_RAW_IP_CSUM   = (1 << 26),
> + IB_DEVICE_MANAGED_FLOW_STEERING = (1 << 29),
> + IB_DEVICE_SIGNATURE_HANDOVER= (1 << 30),
> + IB_DEVICE_ON_DEMAND_PAGING  = (1 << 31),
>  };
>  
>  enum ib_signature_prot_cap {
> -- 
> 1.7.12.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-next 1/6] IB/core: Save the device attributes on the device structure

2015-12-18 Thread ira.weiny
On Fri, Dec 18, 2015 at 11:01:24AM +0200, Or Gerlitz wrote:
> On Fri, Dec 18, 2015 at 7:16 AM, ira.weiny <ira.we...@intel.com> wrote:
> 
> > More than anything what I hate is all the places that allocate struct
> > ib_device_attr just to free it after the query call.
> 
> did you care to look on patches 2-6, this is exactly what they are doing.

Yes I did.  And I don't see how this statement is a problem as that is what
both your's and Christophs patches do.

If anything this was a statement in support of your series.  I'm sorry you
don't see it that way.

However, if we are getting rid of query_device and the last remaining call is
in this first patch why not go all the way and remove that call site as well?

Which is why I also said:

"We discussed this patch ages ago and decided against it and FWIW It does not
hurt my feelings at all to drop it."

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/14] staging/rdma/hfi1: Start adding building blocks for TID caching

2015-12-17 Thread ira.weiny
On Thu, Dec 17, 2015 at 02:59:52PM +0300, Dan Carpenter wrote:
> On Thu, Dec 17, 2015 at 02:00:18AM -0500, ira.we...@intel.com wrote:
> > +}
> > +
> > +static int program_rcvarray(struct file *fp, unsigned long vaddr,
> > +   struct tid_group *grp,
> > +   struct tid_pageset *sets,
> > +   unsigned start, u16 count, struct page **pages,
> > +   u32 *tidlist, unsigned *tididx, unsigned *pmapped)
> > +{
> 
> It's not clear what a zero return from this function means.  Could we
> add some documentation?

Added the following.

/**
 * program_rcvarray() - program an RcvArray group with receive buffers
 * @fp: file pointer
 * @vaddr: starting user virtual address
 * @grp: RcvArray group
 * @sets: array of struct tid_pageset holding information on physically
 *contiguous chunks from the user buffer
 * @start: starting index into sets array
 * @count: number of struct tid_pageset's to program
 * @pages: an array of struct page * for the user buffer
 * @tidlist: the array of u32 elements when the information about the
 *   programmed RcvArray entries is to be encoded.
 * @tididx: starting offset into tidlist
 * @pmapped: (output parameter) number of pages programmed into the RcvArray
 *   entries.
 *
 * This function will program up to 'count' number of RcvArray entries from the
 * group 'grp'. To make best use of write-combining writes, the function will
 * perform writes to the unused RcvArray entries which will be ignored by the
 * HW. Each RcvArray entry will be programmed with a physically contiguous
 * buffer chunk from the user's virtual buffer.
 *
 * Return:
 *  -EINVAL if the requested count is larger than the size of the group,
 *  -ENOMEM or -EFAULT on error from set_rcvarray_entry(), or
 *  number of RcvArray entries programmed.
 */


Ira

> 
> regards,
> dan carpenter
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/14] staging/rdma/hfi1: Add TID entry program function body

2015-12-17 Thread ira.weiny
On Thu, Dec 17, 2015 at 03:06:45PM +0300, Dan Carpenter wrote:
> On Thu, Dec 17, 2015 at 02:00:23AM -0500, ira.we...@intel.com wrote:
> > +   mutex_lock(>exp_lock);
> > +   /*
> > +* The first step is to program the RcvArray entries which are complete
> > +* groups.
> > +*/
> > +   while (ngroups && uctxt->tid_group_list.count) {
> > +   struct tid_group *grp =
> > +   tid_group_pop(>tid_group_list);
> > +
> > +   ret = program_rcvarray(fp, vaddr, grp, pagesets,
> > +  pageidx, dd->rcv_entries.group_size,
> > +  pages, tidlist, , );
> > +   /*
> > +* If there was a failure to program the RcvArray
> > +* entries for the entire group, reset the grp fields
> > +* and add the grp back to the free group list.
> > +*/
> > +   if (ret <= 0) {
> > +   tid_group_add_tail(grp, >tid_group_list);
> > +   hfi1_cdbg(TID,
> > + "Failed to program RcvArray group %d", ret);
> > +   goto unlock;
> 
> We print a failure message but we still return success when ret == 0.

Not programming anything (return of 0) is allowed.  So it should not be an
error here.  User space is required to handle this case.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/14] staging/rdma/hfi1: Start adding building blocks for TID caching

2015-12-17 Thread ira.weiny
On Thu, Dec 17, 2015 at 02:23:39PM +0300, Dan Carpenter wrote:
> On Thu, Dec 17, 2015 at 02:00:18AM -0500, ira.we...@intel.com wrote:
> > From: Mitko Haralanov 
> > +static int unprogram_rcvarray(struct file *fp, u32 tidinfo,
> > + struct tid_group **grp)
> > +{
> > +   struct hfi1_filedata *fd = fp->private_data;
> > +   struct hfi1_ctxtdata *uctxt = fd->uctxt;
> > +   struct hfi1_devdata *dd = uctxt->dd;
> > +   struct mmu_rb_node *node;
> > +   u8 tidctrl = EXP_TID_GET(tidinfo, CTRL);
> > +   u32 tidbase = uctxt->expected_base,
> > +   tididx = EXP_TID_GET(tidinfo, IDX) << 1, rcventry;
> > +
> > +   if (tididx > uctxt->expected_count) {
> 
> Should this be >= ?  I don't think it makes that much difference since
> we're not using it as an offset.

Yes it looks like it should be.

I'm working on a V2 now.

Ira

> 
> > +   dd_dev_err(dd, "Invalid RcvArray entry (%u) index for ctxt 
> > %u\n",
> > +  tididx, uctxt->ctxt);
> > +   return -EINVAL;
> > +   }
> > +
> > +   if (tidctrl == 0x3)
> > +   return -EINVAL;
> > +
> > +   rcventry = tidbase + tididx + (tidctrl - 1);
> > +
> > +   spin_lock(>rb_lock);
> > +   node = mmu_rb_search_by_entry(>tid_rb_root, rcventry);
> > +   if (!node) {
> > +   spin_unlock(>rb_lock);
> > +   return -EBADF;
> > +   }
> > +   rb_erase(>rbnode, >tid_rb_root);
> > +   spin_unlock(>rb_lock);
> > +   if (grp)
> > +   *grp = node->grp;
> > +   clear_tid_node(fd, fd->subctxt, node);
> > +   return 0;
> > +}
> 
> regards,
> dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-next 1/6] IB/core: Save the device attributes on the device structure

2015-12-17 Thread ira.weiny
On Thu, Dec 17, 2015 at 07:31:20PM +0100, Bart Van Assche wrote:
> On 12/17/2015 06:41 PM, Jason Gunthorpe wrote:
> >On Thu, Dec 17, 2015 at 03:44:19PM +0200, Sagi Grimberg wrote:
> >>
> >>>+  ret = ib_query_device(device, >attrs);
> >>>+  if (ret) {
> >>>+  printk(KERN_WARNING "Couldn't query the device 
> >>>attributes\n");
> >>>+  goto out;
> >>>+  }
> >>>+
> >>
> >>I thought we're all for removing the call altogether aren't we?
> >>
> >>I'd say just call device->query_device() instead.
> >
> >Christoph's patch even got rid of device->query_device(), which, IHMO,
> >I prefer to see over this. It re-enforces that these values are
> >constants and drivers cannot change them on the fly.
> 
> I also would like to see the query_device() implementations to be 
> removed from the hw drivers.

As do I.  More than anything what I hate is all the places that allocate struct
ib_device_attr just to free it after the query call.

We discussed this patch ages ago and decided against it and FWIW It does not
hurt my feelings at all to drop it.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path

2015-12-15 Thread ira.weiny
On Tue, Dec 15, 2015 at 10:30:22AM -0700, Jason Gunthorpe wrote:
> On Tue, Dec 15, 2015 at 05:38:34PM +0100, Michael Wang wrote:
> > The hop_limit is only suggest that the package allowed to be
> > routed, not have to, correct?
> 
> If the hop limit is >= 2 (?) then the GRH is mandatory.

Yes >= 2.

"Setting this value to 0 or 1 will ensure that the packet 
will not be forwarded beyond the local subnet."

-- IBTA 8.3.6 Hop Limit

Ira

> The
> SM will return this information in the PathRecord if it determines a
> GRH is required. The whole stack follows this protocol.
> 
> The GRH is optional for in-subnet communications.
> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/2] staging/rdma/hfi1: set Gen 3 half-swing for integrated devices.

2015-12-14 Thread ira.weiny
Any further feedback on this series?

Ira

On Tue, Dec 01, 2015 at 02:47:55PM -0500, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> This was a single patch before.  The change to dev_dbg required a precursor
> patch to add the dd_dev_dbg which is consistent with the other dev_* macros
> which automatically use struct hfi1_devdata.
> 
> Dean Luick (1):
>   staging/rdma/hfi1: set Gen3 half-swing for integrated devices
> 
> Ira Weiny (1):
>   staging/rdma/hf1: add dd_dev_dbg
> 
>  drivers/staging/rdma/hfi1/chip_registers.h | 11 
>  drivers/staging/rdma/hfi1/hfi.h|  4 ++
>  drivers/staging/rdma/hfi1/pcie.c   | 82 
> --
>  3 files changed, 93 insertions(+), 4 deletions(-)
> 
> -- 
> 1.8.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH infiniband-diags] ibportstate.c: Fix unsigned comparison warnings

2015-12-12 Thread ira.weiny
On Tue, Nov 10, 2015 at 01:23:01PM +0200, Hal Rosenstock wrote:
> 
> src/ibportstate.c:450: warning: comparison of unsigned expression < 0 is 
> always false
> src/ibportstate.c:454: warning: comparison of unsigned expression < 0 is 
> always false
> src/ibportstate.c:458: warning: comparison of unsigned expression < 0 is 
> always false
> src/ibportstate.c:462: warning: comparison of unsigned expression < 0 is 
> always false
> src/ibportstate.c:483: warning: comparison of unsigned expression < 0 is 
> always false
> src/ibportstate.c:500: warning: comparison of unsigned expression < 0 is 
> always false
> src/ibportstate.c:504: warning: comparison of unsigned expression < 0 is 
> always false
> 
> Signed-off-by: Hal Rosenstock 

Thanks applied,
Ira

> ---
> diff --git a/src/ibportstate.c b/src/ibportstate.c
> index 47e9133..cb47aa9 100644
> --- a/src/ibportstate.c
> +++ b/src/ibportstate.c
> @@ -447,40 +447,40 @@ int main(int argc, char **argv)
>   val = strtoull(argv[i], 0, 0);
>   switch (j) {
>   case SPEED:
> - if (val < 0 || val > 15)
> + if (val > 15)
>   IBEXIT("invalid speed value %ld", val);
>   break;
>   case ESPEED:
> - if (val < 0 || val > 31)
> + if (val > 31)
>   IBEXIT("invalid extended speed value 
> %ld", val);
>   break;
>   case FDR10SPEED:
> - if (val < 0 || val > 1)
> + if (val > 1)
>   IBEXIT("invalid fdr10 speed value %ld", 
> val);
>   break;
>   case WIDTH:
> - if (val < 0 || (val > 15 && val != 255))
> + if ((val > 15 && val != 255))
>   IBEXIT("invalid width value %ld", val);
>   break;
>   case VLS:
> - if (val <= 0 || val > 5)
> + if (val == 0 || val > 5)
>   IBEXIT("invalid vls value %ld", val);
>   break;
>   case MTU:
> - if (val <= 0 || val > 5)
> + if (val == 0 || val > 5)
>   IBEXIT("invalid mtu value %ld", val);
>   break;
>   case LID:
> - if (val <= 0 || val >= 0xC000)
> + if (val == 0 || val >= 0xC000)
>   IBEXIT("invalid lid value 0x%lx", val);
>   break;
>   case SMLID:
> - if (val <= 0 || val >= 0xC000)
> + if (val == 0 || val >= 0xC000)
>   IBEXIT("invalid smlid value 0x%lx",
>   val);
>   break;
>   case LMC:
> - if (val < 0 || val > 7)
> + if (val > 7)
>   IBEXIT("invalid lmc value %ld", val);
>   break;
>   case MKEY:
> @@ -497,11 +497,11 @@ int main(int argc, char **argv)
>   /* All 64-bit values are legal */
>   break;
>   case MKEYLEASE:
> - if (val < 0 || val > 0x)
> + if (val > 0x)
>   IBEXIT("invalid mkey lease time %ld", 
> val);
>   break;
>   case MKEYPROT:
> - if (val < 0 || val > 3)
> + if (val > 3)
>   IBEXIT("invalid mkey protection bit 
> setting %ld", val);
>   }
>   *port_args[j].val = val;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH infiniband-diags] ibsendtrap.c: Eliminate unused parameter from send_trap

2015-12-12 Thread ira.weiny
On Tue, Nov 10, 2015 at 01:23:39PM +0200, Hal Rosenstock wrote:
> 
> src/ibsendtrap.c: In function ?send_trap?:
> src/ibsendtrap.c:124: warning: unused parameter ?name?
> 
> Signed-off-by: Hal Rosenstock 

Thanks applied,
Ira

> ---
> diff --git a/src/ibsendtrap.c b/src/ibsendtrap.c
> index 46fd6e7..659f2d2 100644
> --- a/src/ibsendtrap.c
> +++ b/src/ibsendtrap.c
> @@ -121,8 +121,7 @@ static void build_trap129(ib_mad_notice_attr_t * n, 
> ib_portid_t * port)
>   n->data_details.ntc_129_131.port_num = (uint8_t) error_port;
>  }
>  
> -static int send_trap(const char *name,
> -  void (*build) (ib_mad_notice_attr_t *, ib_portid_t *))
> +static int send_trap(void (*build) (ib_mad_notice_attr_t *, ib_portid_t *))
>  {
>   ib_portid_t sm_port;
>   ib_portid_t selfportid;
> @@ -169,7 +168,7 @@ int process_send_trap(char *trap_name)
>  
>   for (i = 0; traps[i].trap_name; i++)
>   if (strcmp(traps[i].trap_name, trap_name) == 0)
> - return send_trap(trap_name, traps[i].build_func);
> + return send_trap(traps[i].build_func);
>   ibdiag_show_usage();
>   return 1;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH infiniband-diags] Remove unused pisize parameter from dump_portinfo in ibdiag_common

2015-12-12 Thread ira.weiny
On Tue, Nov 10, 2015 at 01:22:51PM +0200, Hal Rosenstock wrote:
> 
> src/ibdiag_common.c: In function ?dump_portinfo?:
> src/ibdiag_common.c:856: warning: unused parameter ?pisize?
> 
> Signed-off-by: Hal Rosenstock 

Thanks applied,
Ira

> ---
> diff --git a/include/ibdiag_common.h b/include/ibdiag_common.h
> index af5b3c4..21b0522 100644
> --- a/include/ibdiag_common.h
> +++ b/include/ibdiag_common.h
> @@ -150,7 +150,7 @@ int vsnprint_field(char *buf, size_t n, enum MAD_FIELDS 
> f, int spacing,
>  const char *format, va_list va_args);
>  int snprint_field(char *buf, size_t n, enum MAD_FIELDS f, int spacing,
> const char *format, ...);
> -void dump_portinfo(void *pi, int pisize, int tabs);
> +void dump_portinfo(void *pi, int tabs);
>  
>  /**
>   * Some common command line parsing
> diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c
> index 5ec0167..3ebdbb9 100644
> --- a/src/ibdiag_common.c
> +++ b/src/ibdiag_common.c
> @@ -853,7 +853,7 @@ int snprint_field(char *buf, size_t n, enum MAD_FIELDS f, 
> int spacing,
>   return ret;
>  }
>  
> -void dump_portinfo(void *pi, int pisize, int tabs)
> +void dump_portinfo(void *pi, int tabs)
>  {
>   int field, i;
>   char val[64];
> diff --git a/src/saquery.c b/src/saquery.c
> index cc8d8dc..bd70b7b 100644
> --- a/src/saquery.c
> +++ b/src/saquery.c
> @@ -304,7 +304,7 @@ static void dump_one_portinfo_record(void *data, struct 
> query_params *p)
>  "\t\tOptions.0x%x\n"
>  "\tPortInfo dump:\n",
>  cl_ntoh16(pir->lid), pir->port_num, pir->options);
> - dump_portinfo(pi, sizeof(*pi), 2);
> + dump_portinfo(pi, 2);
>  }
>  
>  static void dump_one_mcmember_record(void *data, struct query_params *p)
> diff --git a/src/smpquery.c b/src/smpquery.c
> index 187ef61..2bd7132 100644
> --- a/src/smpquery.c
> +++ b/src/smpquery.c
> @@ -161,7 +161,7 @@ static char *port_info(ib_portid_t * dest, char **argv, 
> int argc)
>   return "port info query failed";
>  
>   printf("# Port info: %s port %d\n", portid2str(dest), orig_portnum);
> - dump_portinfo(data, sizeof data, 0);
> + dump_portinfo(data, 0);
>   return 0;
>  }
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ipoib mcast sendonly join: Move multicast specific code out of ipoib_main.c.

2015-12-11 Thread ira.weiny
On Fri, Dec 11, 2015 at 12:16:54PM -0600, Christoph Lameter wrote:
> Code cleanup to move multicast specific code that checks for
> a sendonly join to ipoib_multicast.c. This allows the removal
> of the export of __ipoib_mcast_find().
> 
> Signed-off-by: Christoph Lameter 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib.h   |  3 ++-
>  drivers/infiniband/ulp/ipoib/ipoib_main.c  | 13 +
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 21 -
>  3 files changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
> b/drivers/infiniband/ulp/ipoib/ipoib.h
> index 989c409..a13f48c 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -549,7 +549,8 @@ void ipoib_path_iter_read(struct ipoib_path_iter *iter,
>  int ipoib_mcast_attach(struct net_device *dev, u16 mlid,
>  union ib_gid *mgid, int set_qkey);
>  void ipoib_mcast_remove_list(struct net_device *dev, struct list_head 
> *remove_list);
> -struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid);
> +void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
> + struct list_head *remove_list);

I think I would rather see this called something like

ipoib_add_to_list_sendonly

Or something...

Calling it iboib_check* sounds like it should return a bool.

>  
>  int ipoib_init_qp(struct net_device *dev);
>  int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 483ff20..6b16428 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1150,7 +1150,6 @@ static void __ipoib_reap_neigh(struct ipoib_dev_priv 
> *priv)
>   unsigned long flags;
>   int i;
>   LIST_HEAD(remove_list);
> - struct ipoib_mcast *mcast;
>   struct net_device *dev = priv->dev;
>  
>   if (test_bit(IPOIB_STOP_NEIGH_GC, >flags))
> @@ -1179,18 +1178,8 @@ static void __ipoib_reap_neigh(struct ipoib_dev_priv 
> *priv)
> 
> lockdep_is_held(>lock))) != NULL) {
>   /* was the neigh idle for two GC periods */
>   if (time_after(neigh_obsolete, neigh->alive)) {
> - u8 *mgid = neigh->daddr + 4;
>  
> - /* Is this multicast ? */
> - if (*mgid == 0xff) {
> - mcast = __ipoib_mcast_find(dev, mgid);
> -
> - if (mcast && 
> test_bit(IPOIB_MCAST_FLAG_SENDONLY, >flags)) {
> - list_del(>list);
> - rb_erase(>rb_node, 
> >multicast_tree);
> - list_add_tail(>list, 
> _list);
> - }
> - }
> + ipoib_check_mcast_sendonly(priv, neigh->daddr + 
> 4, _list);
>  
>   rcu_assign_pointer(*np,
>  
> rcu_dereference_protected(neigh->hnext,
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index 8acb420a..1158819 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -153,7 +153,7 @@ static struct ipoib_mcast *ipoib_mcast_alloc(struct 
> net_device *dev,
>   return mcast;
>  }
>  
> -struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid)
> +static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void 
> *mgid)
>  {
>   struct ipoib_dev_priv *priv = netdev_priv(dev);
>   struct rb_node *n = priv->multicast_tree.rb_node;
> @@ -704,6 +704,25 @@ static int ipoib_mcast_leave(struct net_device *dev, 
> struct ipoib_mcast *mcast)
>   return 0;
>  }
>  
> +/*
> + * Check if the multicast group is sendonly. If so remove it from the maps
> + * and add to the remove list
> + */
> +void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
> + struct list_head *remove_list)
> +{
> + /* Is this multicast ? */
> + if (*mgid == 0xff) {

Odd to see a mgid variable which is only u8?

How about "gid_prefix"?

Ira

> + struct ipoib_mcast *mcast = __ipoib_mcast_find(priv->dev, mgid);
> +
> + if (mcast && test_bit(IPOIB_MCAST_FLAG_SENDONLY, 
> >flags)) {
> + list_del(>list);
> + rb_erase(>rb_node, >multicast_tree);
> + list_add_tail(>list, remove_list);
> + }
> + }
> +}
> +
>  void ipoib_mcast_remove_list(struct net_device *dev, struct list_head 
> 

Re: [PATCH 2/2] ipoib mcast sendonly join: Move multicast specific code out of ipoib_main.c.

2015-12-11 Thread ira.weiny
On Fri, Dec 11, 2015 at 06:21:43PM -0500, ira. weiny wrote:
> On Fri, Dec 11, 2015 at 12:16:54PM -0600, Christoph Lameter wrote:
> > Code cleanup to move multicast specific code that checks for
> > a sendonly join to ipoib_multicast.c. This allows the removal
> > of the export of __ipoib_mcast_find().
> > 
> > Signed-off-by: Christoph Lameter 
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib.h   |  3 ++-
> >  drivers/infiniband/ulp/ipoib/ipoib_main.c  | 13 +
> >  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 21 -
> >  3 files changed, 23 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
> > b/drivers/infiniband/ulp/ipoib/ipoib.h
> > index 989c409..a13f48c 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> > @@ -549,7 +549,8 @@ void ipoib_path_iter_read(struct ipoib_path_iter *iter,
> >  int ipoib_mcast_attach(struct net_device *dev, u16 mlid,
> >union ib_gid *mgid, int set_qkey);
> >  void ipoib_mcast_remove_list(struct net_device *dev, struct list_head 
> > *remove_list);
> > -struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid);
> > +void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
> > +   struct list_head *remove_list);
> 
> I think I would rather see this called something like
> 
> ipoib_add_to_list_sendonly
> 
> Or something...
> 
> Calling it iboib_check* sounds like it should return a bool.
> 
> >  
> >  int ipoib_init_qp(struct net_device *dev);
> >  int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca);
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index 483ff20..6b16428 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -1150,7 +1150,6 @@ static void __ipoib_reap_neigh(struct ipoib_dev_priv 
> > *priv)
> > unsigned long flags;
> > int i;
> > LIST_HEAD(remove_list);
> > -   struct ipoib_mcast *mcast;
> > struct net_device *dev = priv->dev;
> >  
> > if (test_bit(IPOIB_STOP_NEIGH_GC, >flags))
> > @@ -1179,18 +1178,8 @@ static void __ipoib_reap_neigh(struct ipoib_dev_priv 
> > *priv)
> >   
> > lockdep_is_held(>lock))) != NULL) {
> > /* was the neigh idle for two GC periods */
> > if (time_after(neigh_obsolete, neigh->alive)) {
> > -   u8 *mgid = neigh->daddr + 4;
> >  
> > -   /* Is this multicast ? */
> > -   if (*mgid == 0xff) {
> > -   mcast = __ipoib_mcast_find(dev, mgid);
> > -
> > -   if (mcast && 
> > test_bit(IPOIB_MCAST_FLAG_SENDONLY, >flags)) {
> > -   list_del(>list);
> > -   rb_erase(>rb_node, 
> > >multicast_tree);
> > -   list_add_tail(>list, 
> > _list);
> > -   }
> > -   }
> > +   ipoib_check_mcast_sendonly(priv, neigh->daddr + 
> > 4, _list);
> >  
> > rcu_assign_pointer(*np,
> >
> > rcu_dereference_protected(neigh->hnext,
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
> > b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> > index 8acb420a..1158819 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> > @@ -153,7 +153,7 @@ static struct ipoib_mcast *ipoib_mcast_alloc(struct 
> > net_device *dev,
> > return mcast;
> >  }
> >  
> > -struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid)
> > +static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void 
> > *mgid)
> >  {
> > struct ipoib_dev_priv *priv = netdev_priv(dev);
> > struct rb_node *n = priv->multicast_tree.rb_node;
> > @@ -704,6 +704,25 @@ static int ipoib_mcast_leave(struct net_device *dev, 
> > struct ipoib_mcast *mcast)
> > return 0;
> >  }
> >  
> > +/*
> > + * Check if the multicast group is sendonly. If so remove it from the maps
> > + * and add to the remove list
> > + */
> > +void ipoib_check_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
> > +   struct list_head *remove_list)
> > +{
> > +   /* Is this multicast ? */
> > +   if (*mgid == 0xff) {
> 
> Odd to see a mgid variable which is only u8?
> 
> How about "gid_prefix"?

Nevermind on this one, I misread this, it's a pointer to the full gid...

Ira

> 
> Ira
> 
> > +   struct ipoib_mcast *mcast = __ipoib_mcast_find(priv->dev, mgid);
> > +
> > +   if (mcast && 

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-11 Thread ira.weiny
On Fri, Dec 11, 2015 at 12:25:35PM -0600, Christoph Lameter wrote:
> Display the additional 64 bit counters available through the extended
> set and replace the existing 32 bit counters if there is a 64 bit
> alternative available.
> 
> Note: This requires universal support of extended counters in
> the devices. If there are still devices around that do not
> support extended counters then we will have to add some fallback
> technique here.

Looks like ocrdma will break here.

I'm not sure about mthca.

qib, mlx4 are fine.  mlx5 should be as well I would think (I don't have that
hardware.)

hfi1 did not process these MADs previously as all the hardware counters are 64
bits.  But with this patch series we would add it.

ehca, amso1100, and ipath are all gone so they don't matter.

I can whip up a patch for hfi1 and we have to wait for Doug to take over that
driver anyway to make sure that the patch would apply.  So I think you can
ignore it.

ocrdma seems like it could be a quick patch pre this one.

Ira

> 
> Signed-off-by: Christoph Lameter 
> ---
>  drivers/infiniband/core/sysfs.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 0083a4f..f7f2954 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -406,10 +406,14 @@ static PORT_PMA_ATTR(port_rcv_constraint_errors ,  
> 8,  8, 136, IB_PMA_PORT_C
>  static PORT_PMA_ATTR(local_link_integrity_errors,  9,  4, 152, 
> IB_PMA_PORT_COUNTERS);
>  static PORT_PMA_ATTR(excessive_buffer_overrun_errors, 10,  4, 156, 
> IB_PMA_PORT_COUNTERS);
>  static PORT_PMA_ATTR(VL15_dropped, 11, 16, 176, 
> IB_PMA_PORT_COUNTERS);
> -static PORT_PMA_ATTR(port_xmit_data  , 12, 32, 192, 
> IB_PMA_PORT_COUNTERS);
> -static PORT_PMA_ATTR(port_rcv_data   , 13, 32, 224, 
> IB_PMA_PORT_COUNTERS);
> -static PORT_PMA_ATTR(port_xmit_packets   , 14, 32, 256, 
> IB_PMA_PORT_COUNTERS);
> -static PORT_PMA_ATTR(port_rcv_packets, 15, 32, 288, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_xmit_data  ,  0, 64,  64, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(port_rcv_data   ,  0, 64, 128, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(port_xmit_packets   ,  0, 64, 192, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(port_rcv_packets,  0, 64, 256, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(unicast_xmit_packets,  0, 64, 320, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(unicast_rcv_packets ,  0, 64, 384, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(multicast_xmit_packets  ,  0, 64, 448, 
> IB_PMA_PORT_COUNTERS_EXT);
> +static PORT_PMA_ATTR(multicast_rcv_packets   ,  0, 64, 512, 
> IB_PMA_PORT_COUNTERS_EXT);
>  
>  static struct attribute *pma_attrs[] = {
>   _pma_attr_symbol_error.attr.attr,
> @@ -428,6 +432,10 @@ static struct attribute *pma_attrs[] = {
>   _pma_attr_port_rcv_data.attr.attr,
>   _pma_attr_port_xmit_packets.attr.attr,
>   _pma_attr_port_rcv_packets.attr.attr,
> + _pma_attr_unicast_rcv_packets.attr.attr,
> + _pma_attr_unicast_xmit_packets.attr.attr,
> + _pma_attr_multicast_rcv_packets.attr.attr,
> + _pma_attr_multicast_xmit_packets.attr.attr,
>   NULL
>  };
>  
> -- 
> 2.5.0
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] IB core: Allow specification of attr_id in PORT_PMA_ATTR macro

2015-12-11 Thread ira.weiny
On Fri, Dec 11, 2015 at 12:25:33PM -0600, Christoph Lameter wrote:
> This is necessary to support the extended attributes which involves
> a different attribute id.
> 
> Signed-off-by: Christoph Lameter 

Reviewed-by: Ira Weiny 

> ---
>  drivers/infiniband/core/sysfs.c | 41 
> ++---
>  1 file changed, 22 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index b1f37d4..1c8716f 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -39,6 +39,7 @@
>  #include 
>  
>  #include 
> +#include 
>  
>  struct ib_port {
>   struct kobject kobj;
> @@ -65,6 +66,7 @@ struct port_table_attribute {
>   struct port_attribute   attr;
>   charname[8];
>   int index;
> + int attr_id;
>  };
>  
>  static ssize_t port_attr_show(struct kobject *kobj,
> @@ -311,10 +313,11 @@ static ssize_t show_port_pkey(struct ib_port *p, struct 
> port_attribute *attr,
>   return sprintf(buf, "0x%04x\n", pkey);
>  }
>  
> -#define PORT_PMA_ATTR(_name, _counter, _width, _offset)  
> \
> +#define PORT_PMA_ATTR(_name, _counter, _width, _offset, _attr_id)\
>  struct port_table_attribute port_pma_attr_##_name = {
> \
>   .attr  = __ATTR(_name, S_IRUGO, show_pma_counter, NULL),\
> - .index = (_offset) | ((_width) << 16) | ((_counter) << 24)  \
> + .index = (_offset) | ((_width) << 16) | ((_counter) << 24), \
> + .attr_id = _attr_id ,   \
>  }
>  
>  static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute 
> *attr,
> @@ -344,7 +347,7 @@ static ssize_t show_pma_counter(struct ib_port *p, struct 
> port_attribute *attr,
>   in_mad->mad_hdr.mgmt_class= IB_MGMT_CLASS_PERF_MGMT;
>   in_mad->mad_hdr.class_version = 1;
>   in_mad->mad_hdr.method= IB_MGMT_METHOD_GET;
> - in_mad->mad_hdr.attr_id   = cpu_to_be16(0x12); /* PortCounters */
> + in_mad->mad_hdr.attr_id   = tab_attr->attr_id;
>  
>   in_mad->data[41] = p->port_num; /* PortSelect field */
>  
> @@ -386,22 +389,22 @@ out:
>   return ret;
>  }
>  
> -static PORT_PMA_ATTR(symbol_error,  0, 16,  32);
> -static PORT_PMA_ATTR(link_error_recovery ,  1,  8,  48);
> -static PORT_PMA_ATTR(link_downed ,  2,  8,  56);
> -static PORT_PMA_ATTR(port_rcv_errors ,  3, 16,  64);
> -static PORT_PMA_ATTR(port_rcv_remote_physical_errors,  4, 16,  80);
> -static PORT_PMA_ATTR(port_rcv_switch_relay_errors   ,  5, 16,  96);
> -static PORT_PMA_ATTR(port_xmit_discards  ,  6, 16, 112);
> -static PORT_PMA_ATTR(port_xmit_constraint_errors,  7,  8, 128);
> -static PORT_PMA_ATTR(port_rcv_constraint_errors  ,  8,  8, 136);
> -static PORT_PMA_ATTR(local_link_integrity_errors,  9,  4, 152);
> -static PORT_PMA_ATTR(excessive_buffer_overrun_errors, 10,  4, 156);
> -static PORT_PMA_ATTR(VL15_dropped, 11, 16, 176);
> -static PORT_PMA_ATTR(port_xmit_data  , 12, 32, 192);
> -static PORT_PMA_ATTR(port_rcv_data   , 13, 32, 224);
> -static PORT_PMA_ATTR(port_xmit_packets   , 14, 32, 256);
> -static PORT_PMA_ATTR(port_rcv_packets, 15, 32, 288);
> +static PORT_PMA_ATTR(symbol_error,  0, 16,  32, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(link_error_recovery ,  1,  8,  48, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(link_downed ,  2,  8,  56, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_rcv_errors ,  3, 16,  64, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_rcv_remote_physical_errors,  4, 16,  80, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_rcv_switch_relay_errors   ,  5, 16,  96, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_xmit_discards  ,  6, 16, 112, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_xmit_constraint_errors,  7,  8, 128, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_rcv_constraint_errors  ,  8,  8, 136, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(local_link_integrity_errors,  9,  4, 152, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(excessive_buffer_overrun_errors, 10,  4, 156, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(VL15_dropped, 11, 16, 176, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_xmit_data  , 12, 32, 192, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_rcv_data   , 13, 32, 224, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_xmit_packets   , 14, 32, 256, 
> IB_PMA_PORT_COUNTERS);
> +static PORT_PMA_ATTR(port_rcv_packets, 15, 32, 288, 
> 

Re: [PATCH 2/3] IB core: Support 64 bit values in the port counters

2015-12-11 Thread ira.weiny
On Fri, Dec 11, 2015 at 12:25:34PM -0600, Christoph Lameter wrote:
> Add a branch to display 64 bit values
> 
> Signed-off-by: Christoph Lameter 

Reviewed-by: Ira Weiny 

> ---
>  drivers/infiniband/core/sysfs.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
> index 1c8716f..0083a4f 100644
> --- a/drivers/infiniband/core/sysfs.c
> +++ b/drivers/infiniband/core/sysfs.c
> @@ -378,6 +378,11 @@ static ssize_t show_pma_counter(struct ib_port *p, 
> struct port_attribute *attr,
>   ret = sprintf(buf, "%u\n",
> be32_to_cpup((__be32 *)(out_mad->data + 40 + 
> offset / 8)));
>   break;
> + case 64:
> + ret = sprintf(buf, "%llu\n",
> + be64_to_cpup((__be64 *)(out_mad->data + 40 + 
> offset / 8)));
> + break;
> +
>   default:
>   ret = 0;
>   }
> -- 
> 2.5.0
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] ipoib mcast sendonly join: Isolate common list remove code

2015-12-11 Thread ira.weiny
On Fri, Dec 11, 2015 at 12:16:53PM -0600, Christoph Lameter wrote:
> Code cleanup to remove multicast specific code from ipoib_main.c
> 
> The removal of a list of multicast groups occurs in three places.
> Create a new function ipoib_mcast_remove_list(). Use this new
> function in ipoib_main.c too.
> That in turn allows the dropping of two functions that were
> exported from ipoib_multicast.c for expiration of mc groups.
> 
> Signed-off-by: Christoph Lameter 

Reviewed-by: Ira Weiny 

> ---
>  drivers/infiniband/ulp/ipoib/ipoib.h   |  3 +--
>  drivers/infiniband/ulp/ipoib/ipoib_main.c  |  7 ++-
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 24 ++--
>  3 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
> b/drivers/infiniband/ulp/ipoib/ipoib.h
> index 3ede103..989c409 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -495,7 +495,6 @@ void ipoib_dev_cleanup(struct net_device *dev);
>  void ipoib_mcast_join_task(struct work_struct *work);
>  void ipoib_mcast_carrier_on_task(struct work_struct *work);
>  void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff 
> *skb);
> -void ipoib_mcast_free(struct ipoib_mcast *mc);
>  
>  void ipoib_mcast_restart_task(struct work_struct *work);
>  int ipoib_mcast_start_thread(struct net_device *dev);
> @@ -549,7 +548,7 @@ void ipoib_path_iter_read(struct ipoib_path_iter *iter,
>  
>  int ipoib_mcast_attach(struct net_device *dev, u16 mlid,
>  union ib_gid *mgid, int set_qkey);
> -int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast);
> +void ipoib_mcast_remove_list(struct net_device *dev, struct list_head 
> *remove_list);
>  struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid);
>  
>  int ipoib_init_qp(struct net_device *dev);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 7d32818..483ff20 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1150,7 +1150,7 @@ static void __ipoib_reap_neigh(struct ipoib_dev_priv 
> *priv)
>   unsigned long flags;
>   int i;
>   LIST_HEAD(remove_list);
> - struct ipoib_mcast *mcast, *tmcast;
> + struct ipoib_mcast *mcast;
>   struct net_device *dev = priv->dev;
>  
>   if (test_bit(IPOIB_STOP_NEIGH_GC, >flags))
> @@ -1207,10 +1207,7 @@ static void __ipoib_reap_neigh(struct ipoib_dev_priv 
> *priv)
>  
>  out_unlock:
>   spin_unlock_irqrestore(>lock, flags);
> - list_for_each_entry_safe(mcast, tmcast, _list, list) {
> - ipoib_mcast_leave(dev, mcast);
> - ipoib_mcast_free(mcast);
> - }
> + ipoib_mcast_remove_list(dev, _list);
>  }
>  
>  static void ipoib_reap_neigh(struct work_struct *work)
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index f357ca6..8acb420a 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -106,7 +106,7 @@ static void __ipoib_mcast_schedule_join_thread(struct 
> ipoib_dev_priv *priv,
>   queue_delayed_work(priv->wq, >mcast_task, 0);
>  }
>  
> -void ipoib_mcast_free(struct ipoib_mcast *mcast)
> +static void ipoib_mcast_free(struct ipoib_mcast *mcast)
>  {
>   struct net_device *dev = mcast->dev;
>   int tx_dropped = 0;
> @@ -677,7 +677,7 @@ int ipoib_mcast_stop_thread(struct net_device *dev)
>   return 0;
>  }
>  
> -int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast)
> +static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast 
> *mcast)
>  {
>   struct ipoib_dev_priv *priv = netdev_priv(dev);
>   int ret = 0;
> @@ -704,6 +704,16 @@ int ipoib_mcast_leave(struct net_device *dev, struct 
> ipoib_mcast *mcast)
>   return 0;
>  }
>  
> +void ipoib_mcast_remove_list(struct net_device *dev, struct list_head 
> *remove_list)
> +{
> + struct ipoib_mcast *mcast, *tmcast;
> +
> + list_for_each_entry_safe(mcast, tmcast, remove_list, list) {
> + ipoib_mcast_leave(dev, mcast);
> + ipoib_mcast_free(mcast);
> + }
> +}
> +
>  void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
>  {
>   struct ipoib_dev_priv *priv = netdev_priv(dev);
> @@ -810,10 +820,7 @@ void ipoib_mcast_dev_flush(struct net_device *dev)
>   if (test_bit(IPOIB_MCAST_FLAG_BUSY, >flags))
>   wait_for_completion(>done);
>  
> - list_for_each_entry_safe(mcast, tmcast, _list, list) {
> - ipoib_mcast_leave(dev, mcast);
> - ipoib_mcast_free(mcast);
> - }
> + ipoib_mcast_remove_list(dev, _list);
>  }
>  
>  static int ipoib_mcast_addr_is_valid(const u8 *addr, const u8 *broadcast)
> @@ 

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-11 Thread ira.weiny
On Fri, Dec 11, 2015 at 05:47:15PM -0700, Jason Gunthorpe wrote:
> On Fri, Dec 11, 2015 at 07:23:13PM -0500, ira.weiny wrote:
> > On Fri, Dec 11, 2015 at 05:00:47PM -0700, Jason Gunthorpe wrote:
> > > 
> > > FWIW, I also hate the sysfs counters that reflect the PMA, these would
> > > be much better are free running, wrapping, non-resetting counters
> > > unrelated to the PMA. Something that doesn't zero after the SM samples
> > > it. Sounds like qib, hif, rdmavt can trivially fix this, and should, IMHO.
> > 
> > To be fair with a 64bit counter these are not going to get reset very often.
> 
> It resets when ever the SM sends a reset packet, so 'whenever'

It's been a while since I have looked at OpenSMs PM but a 64 bit counter does
not get reset very often.

For OPA with 64bit counters the same is true.

> 
> > Furthermore, I don't think we can change the behavior now.
> 
> Sure we can, the restting is really a bug, to the point that nothing
> can actually use the sysfs counters reliably.

Well...  That would be changing the behavior that everyone is expecting.  I
know your argument; these are useless and no one is using them.  But that is
not the point.  They were there as a check to see the PMA counters directly.
Any change now is breaking the existing ABI.

Don't get me wrong.  I'm not against keeping running counters for devices.
But, I would rather see a more flexible interface.  Something like netlink
perhaps.

And I will again mention that both IB and OPA have PMs which keep running
counters.

For now Christophs series is a welcome improvement.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-11 Thread ira.weiny
On Fri, Dec 11, 2015 at 05:00:47PM -0700, Jason Gunthorpe wrote:
> 
> FWIW, I also hate the sysfs counters that reflect the PMA, these would
> be much better are free running, wrapping, non-resetting counters
> unrelated to the PMA. Something that doesn't zero after the SM samples
> it. Sounds like qib, hif, rdmavt can trivially fix this, and should, IMHO.

To be fair with a 64bit counter these are not going to get reset very often.

Furthermore, I don't think we can change the behavior now.

Frankly for IB/OPA one should be using the central PM.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/37] IB/rdmavt: Add protection domain to rdmavt.

2015-12-10 Thread ira.weiny
On Thu, Dec 10, 2015 at 11:49:40AM -0500, Dalessandro, Dennis wrote:
> On Tue, Dec 08, 2015 at 02:52:36PM -0500, ira. weiny wrote:
> >On Tue, Dec 08, 2015 at 01:17:48PM -0600, Sean Hefty wrote:
> >>> > > +struct ib_pd *rvt_alloc_pd(struct ib_device *ibdev,
> >>> > > +struct ib_ucontext *context,
> >>> > > +struct ib_udata *udata)
> >>> > > +{
> >>> > > + struct rvt_dev_info *dev = ib_to_rvt(ibdev);
> >>> > > + struct rvt_pd *pd;
> >>> > > + struct ib_pd *ret;
> >>> > > +
> >>> > > + pd = kmalloc(sizeof(*pd), GFP_KERNEL);
> >>> > > + if (!pd) {
> >>> > > + ret = ERR_PTR(-ENOMEM);
> >>> > > + goto bail;
> >>> > > + }
> >>> > > + /*
> >>> > > +  * This is actually totally arbitrary.  Some correctness 
> >>tests
> >>> > > +  * assume there's a maximum number of PDs that can be 
> >>allocated.
> >>> > > +  * We don't actually have this limit, but we fail the test if
> >>> > > +  * we allow allocations of more than we report for this 
> >>value.
> >>> > > +  */
> >>> >
> >>> > Why not trap this in user space, rather than forcing the kernel to
> >>> support some test program?
> >>> >
> >>> 
> >>> What do you mean "trap this in user space"?  This code is not supporting
> >>> any
> >>> particular test program.
> >>> 
> >>> If users try and allocate more protection domains then are advertised 
> >>then
> >>> an
> >>> error should be returned.
> >>> 
> >>> Perhaps the comment should be removed if it is confusing?
> >>
> >>There is no theoretical limit on the number of PDs, or likely most other
> >>resources.  So why define and enforce an arbitrary limit?  The 
> >>justification
> >>given was that some test program would fail.  If there's a concern about 
> >>that
> >>test program passing, then enforce the limit in user space.
> >
> >I did not interpret this as being driven by a test but rather by the IBTA 
> >spec.
> >
> >This may be pedantic but, wouldn't it be confusing from a user POV to see 
> >an
> >advertised limit but then not be constrained by it?  I'm not opposed to 
> >setting
> >this limit arbitrarily high, but I still think the implementation should
> >enforce that limit.
> >
> >>
> >>I didn't find the comment confusing.  Just the choice to let a test app 
> >>drive the design.
> >
> >Perhaps "confusing" is the wrong word.  But I did not interpreted the 
> >comment
> >as saying that the implementation was driven but a test program.
> 
> How about I reword the comment to say that while we could continue 
> allocating PDs being only constrained by system resources, the spec says 
> there is a limit so we enforce that?

I'm ok with that, Sean?

Ira

> 
> -Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-09 Thread ira.weiny
On Wed, Dec 09, 2015 at 10:42:35AM -0800, Christoph Hellwig wrote:
> On Tue, Dec 08, 2015 at 07:52:03PM -0500, ira.weiny wrote:
> > Searching patchworks...
> > 
> > I'm a bit worried about the size of the patch and I would like to see it 
> > split
> > up for review.  But I agree Christophs method is better long term.
> 
> I'd be happy to split it up if I could see a way to split it.  So if
> anyone has an idea you're welcome!

Well this is a ~3300 line patch which is pretty hard to review in total.

> 
> > Christoph do you have this on github somewhere?  Perhaps it is split but I'm
> > not finding in on patchworks?
> 
> No need for github, we have much better (and older) git hosting sites :)
> 
> http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/ib_device_attr

Another nice side effect of this patch is to get rid of all the struct
ib_device_attr allocations which are littered all over the ULPs.

For the core, srp, ipoib, qib, hfi1 bits.  Generally the rest looks fine I just
did not have time to really go through it line by line.

Reviewed-by: Ira Weiny <ira.we...@intel.com>

Doug this is going to conflict with the rdmavt work.  So if you take this could
you respond on the list.

Thanks,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] IB/qib: Use rdmavt lid defines in qib

2015-12-08 Thread ira.weiny
On Mon, Dec 07, 2015 at 01:54:39PM -0700, Jason Gunthorpe wrote:
> On Mon, Dec 07, 2015 at 03:49:12PM -0500, Dennis Dalessandro wrote:
> > /* A multicast address requires a GRH (see ch. 8.4.1). */
> > -   if (ah_attr->dlid >= QIB_MULTICAST_LID_BASE &&
> > -   ah_attr->dlid != QIB_PERMISSIVE_LID &&
> > +   if (ah_attr->dlid >= be16_to_cpu(IB_MULTICAST_LID_BASE) &&
> > +   ah_attr->dlid != be16_to_cpu(IB_LID_PERMISSIVE) &&
> 
> Uh cpu_to_be16 please..

But, the defines are big endian and the dlid here is cpu endian.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/37] IB/rdmavt: Add protection domain to rdmavt.

2015-12-08 Thread ira.weiny
On Mon, Dec 07, 2015 at 03:13:16PM -0600, Sean Hefty wrote:
> > +struct ib_pd *rvt_alloc_pd(struct ib_device *ibdev,
> > +  struct ib_ucontext *context,
> > +  struct ib_udata *udata)
> > +{
> > +   struct rvt_dev_info *dev = ib_to_rvt(ibdev);
> > +   struct rvt_pd *pd;
> > +   struct ib_pd *ret;
> > +
> > +   pd = kmalloc(sizeof(*pd), GFP_KERNEL);
> > +   if (!pd) {
> > +   ret = ERR_PTR(-ENOMEM);
> > +   goto bail;
> > +   }
> > +   /*
> > +* This is actually totally arbitrary.  Some correctness tests
> > +* assume there's a maximum number of PDs that can be allocated.
> > +* We don't actually have this limit, but we fail the test if
> > +* we allow allocations of more than we report for this value.
> > +*/
> 
> Why not trap this in user space, rather than forcing the kernel to support 
> some test program?
> 

What do you mean "trap this in user space"?  This code is not supporting any
particular test program.

If users try and allocate more protection domains then are advertised then an
error should be returned.

Perhaps the comment should be removed if it is confusing?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/37] IB/rdmavt: Add protection domain to rdmavt.

2015-12-08 Thread ira.weiny
On Tue, Dec 08, 2015 at 01:17:48PM -0600, Sean Hefty wrote:
> > > > +struct ib_pd *rvt_alloc_pd(struct ib_device *ibdev,
> > > > +  struct ib_ucontext *context,
> > > > +  struct ib_udata *udata)
> > > > +{
> > > > +   struct rvt_dev_info *dev = ib_to_rvt(ibdev);
> > > > +   struct rvt_pd *pd;
> > > > +   struct ib_pd *ret;
> > > > +
> > > > +   pd = kmalloc(sizeof(*pd), GFP_KERNEL);
> > > > +   if (!pd) {
> > > > +   ret = ERR_PTR(-ENOMEM);
> > > > +   goto bail;
> > > > +   }
> > > > +   /*
> > > > +* This is actually totally arbitrary.  Some correctness tests
> > > > +* assume there's a maximum number of PDs that can be allocated.
> > > > +* We don't actually have this limit, but we fail the test if
> > > > +* we allow allocations of more than we report for this value.
> > > > +*/
> > >
> > > Why not trap this in user space, rather than forcing the kernel to
> > support some test program?
> > >
> > 
> > What do you mean "trap this in user space"?  This code is not supporting
> > any
> > particular test program.
> > 
> > If users try and allocate more protection domains then are advertised then
> > an
> > error should be returned.
> > 
> > Perhaps the comment should be removed if it is confusing?
> 
> There is no theoretical limit on the number of PDs, or likely most other
> resources.  So why define and enforce an arbitrary limit?  The justification
> given was that some test program would fail.  If there's a concern about that
> test program passing, then enforce the limit in user space.

I did not interpret this as being driven by a test but rather by the IBTA spec.

This may be pedantic but, wouldn't it be confusing from a user POV to see an
advertised limit but then not be constrained by it?  I'm not opposed to setting
this limit arbitrarily high, but I still think the implementation should
enforce that limit.

> 
> I didn't find the comment confusing.  Just the choice to let a test app drive 
> the design.

Perhaps "confusing" is the wrong word.  But I did not interpreted the comment
as saying that the implementation was driven but a test program.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-08 Thread ira.weiny
On Tue, Dec 08, 2015 at 03:02:44PM -0800, Christoph Hellwig wrote:
> On Tue, Dec 08, 2015 at 03:59:40PM -0700, Jason Gunthorpe wrote:
> > Or, can we please stop this bikeshedding. Christoph's patch is done,
> > well tested and does a lot more clean up that this hacky three liner.
> > 
> > It is a good patch, and although patchworks doesn't have my remarks
> > from an earlier revision I still think it should go forward. 
> 
> While I'd prefer the version Or points to over not having anything
> at all I'd much prefer sorting it properly and make the RDMA code
> behave like all other Linux subsystems.
> 
> Jason, can you give me a formal ACK'ed by and I'll rebase it on top
> of the current 4.4 queue so we could start the 4.5 window with it.

Searching patchworks...

I'm a bit worried about the size of the patch and I would like to see it split
up for review.  But I agree Christophs method is better long term.

Christoph do you have this on github somewhere?  Perhaps it is split but I'm
not finding in on patchworks?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Fix 4.4 IB merge window regressions

2015-11-30 Thread ira.weiny
On Mon, Nov 30, 2015 at 09:34:15AM -0500, Mike Marciniszyn wrote:
> This two patch series fixes regressions for qib and hfi1
> introduced in the 4.4 merge window.
> 
> These are critical for 4.4 and for rebasing the qib/hfi1
> refactoring.

Doug, Greg,

It should also be noted that without this change to uverbs_cmd the hfi1 driver
is at least partially (ib_write_bw test) broken in staging-next.

Greg, if Linus pulls these from Doug into rc4 will you pull them into
staging-next?

Ira

> 
> ---
> 
> Ira Weiny (1):
>   IB/qib: Fix qib_mr structure
> 
> Mike Marciniszyn (1):
>   IB/core: correct issue with sge copyin corrupting wr
> 
> 
>  drivers/infiniband/core/uverbs_cmd.c  |   15 ++-
>  drivers/infiniband/hw/qib/qib_verbs.h |2 +-
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> -- 
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] staging/rdma/hfi1: Method to toggle "fast ECN" detection

2015-11-22 Thread ira.weiny
On Thu, Nov 19, 2015 at 04:54:44PM -0800, Greg KH wrote:
> On Mon, Nov 09, 2015 at 06:34:44PM -0500, ira.we...@intel.com wrote:
> > From: Vennila Megavannan 
> > 
> > Add a module paramter to toggle prescan/Fast ECN Detection and remove the
> > Kconfig option which used to control this.
> 
> Ick, no, not a module parameter, that's horrid (hint, it isn't a
> per-device option...)

This is a good point.  Previous to this patch we had a compile time option
which would have affected all devices and I think we just continued that.  I do
like the idea of making this per port.  I will respin the patch.

However, I want to be clear on your hint.  Are you saying that sysfs would be a
better place to put such a flag?

> 
> Why can't you do this dynamically?

ECN is always on.  The key is the reaction time of the individual port.
Attempting to turn this on and off would affect both the reaction time and the
processing time in a negative way.

> Why would anyone ever want to make this "slow"?

This is a tuning nob for over all fabric performance not individual node
performance.  ECN controls congestion spreading through the network as is
explained in this paper.

http://infocom2003.ieee-infocom.org/papers/28_01.PDF

As is shown in figure 1 and 2 of that paper congestion at node Bc is affecting
traffic at node Bv.  The default reaction time of ECN is likely to be
sufficient for most users based on our experience so far.

However, should a particular network see system wide degradation, this option
can be turned on to increase the reaction time at node Bc.  However, node Bc is
already overloaded so the trade off is likely acceptable.

Unfortunately, it is hard to predict when a user may need this option as we
don't have the resources to build extreme scale fabrics for testing.  Nor do we
know all users workloads or the fabric topologies those workloads may be
running on.

We developed the code which this option enables based on lab experiments.  So
we need this to be an option available to users.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] staging/rdma/hfi1: diag.c Correct code style issues

2015-11-22 Thread ira.weiny
On Tue, Nov 17, 2015 at 02:27:46PM -0800, gre...@linuxfoundation.org wrote:
> On Tue, Nov 17, 2015 at 03:30:24PM -0500, ira.weiny wrote:
> > On Tue, Nov 17, 2015 at 10:17:26PM +0530, Sudip Mukherjee wrote:
> > > On Mon, Nov 16, 2015 at 05:32:34PM -0500, ira.we...@intel.com wrote:
> > > > From: Jubin John <jubin.j...@intel.com>
> > > > 
> > > > Correct the checks on diag.c with the latest checkpatch
> > > > 
> > > > Reviewed-by: Dennis Dalessandro <dennis.dalessan...@intel.com>
> > > > Reviewed-by: Mike Marciniszyn <mike.marcinis...@intel.com>
> > > > Signed-off-by: Jubin John <jubin.j...@intel.com>
> > > > Signed-off-by: Ira Weiny <ira.we...@intel.com>
> > > > ---
> > > 
> > > Different types of changes in a single patch.
> > 
> > Greg,
> > 
> > Do you want me to resubmit the entire series for this?
> 
> Why wouldn't you?  :)
> 

I meant instead of just replacing this single patch with a new "pre" series.

It is easier to just resend the entire series.  I can do that once we settle
whos tree these patches are going to land in.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HFI1 code duplication todo

2015-11-21 Thread ira.weiny
On Sat, Nov 21, 2015 at 01:25:23AM +0300, Dan Carpenter wrote:
> On Fri, Nov 20, 2015 at 10:41:16AM -0500, Doug Ledford wrote:
> > So, as to the hfi1/qib/rxe transport library.  The qib driver is in the
> > rdma tree, and we aren't going to move it to staging just because it
> > depends on something in staging, so we need to start adding the library
> > in the core tree to avoid dependency issues.  As for the hfi1 driver,
> > I'm of the opinion that putting it in staging because of the code
> > duplication issue was probably not the right thing to do.  It's
> > benefited from the extra eyes on it to help make it a better driver, but
> > I think I'm ready to move it to the main RDMA tree and start working on
> > it from there where there won't be any cross tree dependency issues.
> 
> You could leave it staging but manage it from the same git tree.  It
> avoids the cross tree dependencies.  This is what we do for
> drivers/staging/media/, all that stuff goes through the linux-media
> tree.

My concern is that we tried this before and it did not work.

https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg27842.html
https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg27858.html

Shall we try it again?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HFI1 code duplication todo

2015-11-20 Thread ira.weiny
On Fri, Nov 20, 2015 at 10:41:16AM -0500, Doug Ledford wrote:
> On 11/19/2015 05:23 PM, Dennis Dalessandro wrote:
> > On Thu, Nov 12, 2015 at 04:13:18PM -0500, Dennis Dalessandro wrote:
> >> The major todo for the hfi1 driver in staging is getting rid of the
> >> verbs code duplication between ipath, qib, and now hfi1. The ipath
> >> driver has been deprecated and is going to be deleted soon. So that
> >> leaves qib and hfi. To address this we have proposed rdmavt which will
> >> be a common kmod that provides software RDMA verbs support. qib and
> >> hfi1 will both use this as well as other forthcoming drivers such as
> >> soft-roce. See this thread for some details:
> >> http://www.spinics.net/lists/linux-rdma/msg29922.html.
> >>
> >> Since that initial RFC we have been accumulating patches in a GitHub
> >> repo for an early look at the development. At some point soon we want
> >> to actually start posting the patches to the mailing list. This is
> >> where it gets tricky.  The code basically not only adds a new driver
> >> but it modifies two existing ones, heavily. To make it more murky one
> >> driver is in staging the other is in the usual drivers/infiniband tree.
> >>
> >> The question is, how do we go about this logistically due to the 2
> >> drivers being in separate sub-trees?
> >>
> >> Greg, Doug,
> >> As the maintainers of the two trees involved we'd kind of like to get
> >> your thoughts on this.
> > 
> > Hi Greg and Doug,
> > 
> > Just wanted to ping you guys again in case my mail last week slipped
> > through the cracks. We are at the point now where we have some patches
> > we can start posting. Looking for some logistical guidance.
> 
> Sorry, I've been out for a while now with the birth of my second child.
>  Things have settled down enough around here that I should be able to
> get things going again (at least well enough to be more responsive to
> email anyway).

Congratulations!

> 
> So, as to the hfi1/qib/rxe transport library.  The qib driver is in the
> rdma tree, and we aren't going to move it to staging just because it
> depends on something in staging, so we need to start adding the library
> in the core tree to avoid dependency issues.  As for the hfi1 driver,
> I'm of the opinion that putting it in staging because of the code
> duplication issue was probably not the right thing to do.  It's
> benefited from the extra eyes on it to help make it a better driver, but
> I think I'm ready to move it to the main RDMA tree and start working on
> it from there where there won't be any cross tree dependency issues.
> 
> To that end, I've opened my 4.4-rc branch and deleted the three
> deprecated drivers from staging and moved hfi1 to the rdma tree.  I've
> sent an email to Linus to see if he's ok taking those changes, and if
> so, I'll get them submitted and then open up my for-4.5 branch early to
> be able to start taking for-next patches.

I agree this is the correct way to go.

However, we have patches which are either in staging-next, staging-testing, or
in flight.

How should we handle those?  I propose.

Greg can ignore the patches in flight.  We have reworks on them anyway.  Those
which just went in to staging-testing can get dropped and we will resubmit
them.

For those which are in staging-next we can submit revert patches to avoid
merge conflicts and resubmit those as well.  This is the item I am not sure
about?

Or is there a better more standard way of handling this?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] staging/rdma/hfi1: diag.c Correct code style issues

2015-11-17 Thread ira.weiny
On Tue, Nov 17, 2015 at 10:17:26PM +0530, Sudip Mukherjee wrote:
> On Mon, Nov 16, 2015 at 05:32:34PM -0500, ira.we...@intel.com wrote:
> > From: Jubin John 
> > 
> > Correct the checks on diag.c with the latest checkpatch
> > 
> > Reviewed-by: Dennis Dalessandro 
> > Reviewed-by: Mike Marciniszyn 
> > Signed-off-by: Jubin John 
> > Signed-off-by: Ira Weiny 
> > ---
> 
> Different types of changes in a single patch.

Greg,

Do you want me to resubmit the entire series for this?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: In validate_mad, validate CM method and attribute

2015-11-12 Thread ira.weiny
On Thu, Nov 12, 2015 at 01:48:50PM +0200, Hal Rosenstock wrote:
> 
> Receipt of CM MAD with response method for other than ClassPortInfo
> attribute is invalid.
> 
> CM attributes other than ClassPortInfo use send method only
> and GetResp is valid for ClassPortInfo attribute.
> Note also that the CM ClassPortInfo is not currently supported.
> 
> The SRP initiator does not maintain a timeout policy for CM connect
> requests relies on the CM layer to do that. The result was that
> the SRP initiator hung as the connect request never completed.
> 
> A new SRP target has been observed to respond to Send CM REQ
> with GetResp of CM REQ with bad status. This is non conformant
> with IBA spec but exposes a vulnerability in the current MAD/CM
> code which will respond to the incoming GetResp of CM REQ as if
> it was a valid incoming Send of CM REQ rather than tossing
> this on the floor. It also causes the MAD layer not to
> retransmit the original REQ even though it has not received a REP.
> 
> Reviewed-by: Sagi Grimberg 
> Signed-off-by: Hal Rosenstock 
> ---
>  drivers/infiniband/core/mad.c |8 
>  include/rdma/ib_mad.h |2 ++
>  2 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 8d8af7a..e2d425f 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -1811,6 +1811,14 @@ static int validate_mad(const struct ib_mad_hdr 
> *mad_hdr,
>   if (qp_num == 0)
>   valid = 1;
>   } else {
> + /* CM attributes other than ClassPortInfo only use Send method 
> */
> + if (mad_hdr->mgmt_class == IB_MGMT_CLASS_CM) {
> + if (mad_hdr->attr_id != IB_MGMT_CLASSPORTINFO_ATTR_ID) {
> + if (mad_hdr->method != IB_MGMT_METHOD_SEND)
> + goto out;
> + } else if (mad_hdr->method != IB_MGMT_METHOD_GET_RESP)
> + goto out;
> + }

Doesn't this invalidate a CM Get(ClassPortInfo) mad?  Is that the intention
because it is not supported in the CM?

For the future it seems like these types of checks should be done at the class
level.  But I'm not advocating that right now.

Ira

>   /* Filter GSI packets sent to QP0 */
>   if (qp_num != 0)
>   valid = 1;
> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> index 188df91..ec9b44d 100644
> --- a/include/rdma/ib_mad.h
> +++ b/include/rdma/ib_mad.h
> @@ -237,6 +237,8 @@ struct ib_vendor_mad {
>   u8  data[IB_MGMT_VENDOR_DATA];
>  };
>  
> +#define IB_MGMT_CLASSPORTINFO_ATTR_IDcpu_to_be16(0x0001)
> +
>  struct ib_class_port_info {
>   u8  base_version;
>   u8  class_version;
> -- 
> 1.7.8.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] staging/rdma/hfi1: remove unneeded goto done

2015-11-11 Thread ira.weiny
On Wed, Nov 11, 2015 at 12:01:08PM +0300, Dan Carpenter wrote:
> On Wed, Nov 11, 2015 at 12:43:05AM -0500, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > This goto done is followed by an if (ret) break in the outer switch clause. 
> >  It
> > is unnecessary.
> > 
> > Signed-off-by: Dennis Dalessandro 
> > Signed-off-by: Ira Weiny 
> > ---
> >  drivers/staging/rdma/hfi1/diag.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/staging/rdma/hfi1/diag.c 
> > b/drivers/staging/rdma/hfi1/diag.c
> > index 2bb857b2a103..556a47591989 100644
> > --- a/drivers/staging/rdma/hfi1/diag.c
> > +++ b/drivers/staging/rdma/hfi1/diag.c
> > @@ -1053,7 +1053,6 @@ static long hfi1_ioctl(struct file *fp, unsigned int 
> > cmd, unsigned long arg)
> > break;
> > default:
> > ret = -EINVAL;
> > -   goto done;
> > }
> > ret = set_link_state(ppd, dev_state);
> > break;
> 
> Wut?  No it's not.

Sorry, you are correct at this point in the series.

I got over zealous with my splitting of this patch.  This needs to be squashed
into 

staging/rdma/hfi1: Return immediately on error

Which returns here.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] staging/rdma/hfi1: set Gen3 half-swing for integrated devices

2015-11-10 Thread ira.weiny
On Tue, Nov 10, 2015 at 12:59:29PM +0300, Dan Carpenter wrote:
> Gar...  No.  Please please get rid of the PC() macro.  It makes the code
> impossible to understand because instead of hitting CTRL-[ you have
> decode it and then manually type out
> 
> :cs find g CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_SHIFT
> 
> which is the length of a typical college essay.  I meant just put a
> comment like this:
> 
> /*
>  * In the hardware spec these are prefixed with:
>  * CCE_PCIE_CTRL_...
>  * But it is too long to use in code.
>  */
> #define XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_MASK 0x1ull
> 
> Or probably even better:
> 
> #define CCE_PCIE_CTRL (CCE + 0x00C0)
> #define LANE_BUNDLE_MASK  0x3ull /* CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_MASK */
> #define LANE_BUNDLE_SHIFT 0  /* CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_SHIFT 
> */
> #define LANE_DELAY_MASK   0xFull /* 
> CCE_PCIE_CTRL_PCIE_LANE_DELAY_MASK */
> #define LANE_DELAY_SHIFT  2  /* CCE_PCIE_CTRL_PCIE_LANE_DELAY_SHIFT */
> #define MARGIN_OVERWRITE_SHIFT8  /* 
> CCE_PCIE_CTRL_XMT_MARGIN_OVERWRITE_ENABLE_SHIFT */
> #define MARGIN_SHIFT  9  /* CCE_PCIE_CTRL_XMT_MARGIN_SHIFT */
> #define MARGIN_G1G2_OVERWRITE_MASK 0x1ull /* 
> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_MASK */
> #define MARGIN_G1G2_OVERWRITE_SHIFT 12/* 
> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_SHIFT */
> #define MARGIN_G1G2_MASK  0x7ull /* 
> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_MASK */
> #define MARGIN_G1G2_SHIFT 13 /* 
> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_SHIFT */
> 
> Those lines go over the 80 character limit but it's fine.

My apologies for not understanding what you meant.  I took your meaning to be
that we had to honor the checkpatch checks so while the PC macro was
undesirable it was ok if I just made some comments...

FWIW I don't like the PC macro either.  But we have a tool which is generating
these names to be identical to the hardware spec.  And we really want to
preserve those as a reference back to the spec.  Creating additional names which
are in the code is a bit cumbersome but what if we do something like this:


...
#define CCE_PCIE_CTRL (CCE + 0x00C0)
#define CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_MASK 0x3ull
#define CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_SHIFT 0
...




#define LANE_BUNDLE_MASKCCE_PCIE_CTRL_PCIE_LANE_BUNDLE_MASK
#define LANE_BUNDLE_SHIFT   CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_SHIFT
...


?



An alternative would be to define some helper functions such as:

static inline u64 extract_xmt_margin_g1g2(u64 reg)
{
return (reg >> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_SHIFT)
& CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_MASK;
}

...


...
xmt_margin = extract_xmt_margin_g1g2(pcie_ctrl);
...

I prefer the second option as it preserves the register names right in the
code.  So you can reference the hardware spec without looking anything up in a
header file.

I again apologize for misunderstanding your previous meaning.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/9] staging/rdma/hfi1: Add function stubs for TID caching

2015-11-10 Thread ira.weiny
> > 
> > Ok, patch 7-9 of this series do not depend on this patch nor number 6.  I 
> > will
> > resend those 3 while I figure out what to do about these 2:
> > 
> > staging/rdma/hfi1: Add function stubs for TID caching
> > staging/rdma/hfi1: Implement Expected Receive TID caching
> > 
> > Frankly this was an attempt to reduce the size of "Implement Expected 
> > Receive
> > TID caching".  I obviously did something wrong.
> > 
> > I really don't know that I can split these up any more without causing 
> > issues
> > with bisecting the code.
> 
> I strongly doubt that you created this new feature all "at once", it
> took a set of steps to get to your final destination.  So show that
> work, like your math professor told you...
> 

The original author and I have been going through the code to see what we can
do.  We have identified a couple of other pieces which can be split.

One question.  Is it ok to have functionality which is added which is unused in
a preliminary patch?  I believe this is ok as long as the code compiles but I
just wanted to make sure.  While there are different operations added in this
patch it is broken to not use them as a set.  So we need to have a series which
implement the pieces with a final patch which exposes the set of operations.

Is this acceptable?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] IB/hfi1: might sleep under spinlock in hfi1_ioctl()

2015-11-10 Thread ira.weiny
On Fri, Oct 30, 2015 at 07:58:18PM -0400, ira. weiny wrote:
> On Sat, Oct 31, 2015 at 12:32:29AM +0300, Alexey Khoroshilov wrote:
> > Hello,
> > 
> > hfi1_ioctl() contains many calls to might sleep functions with
> > dd->hfi1_snoop.snoop_lock spinlock held (for example, access_ok,
> > copy_from_user, kzalloc(GFP_KERNEL), etc.).
> > 
> > Should dd->hfi1_snoop.snoop_lock be acquired just before updating state?
> 
> I believe you are correct.
> 
> I am currently in the process of pushing fixes to the staging tree.
> 
> We have a patch which fixes this queued up but it depends on at least one 
> other
> patch in my queue.
> 
> I will do my best to get this submitted soon.

I have just posted a series which addresses this problem as well as doing
general clean up on hfi1_ioctl.  The specific fix is contained in this patch.

[PATCH 7/8] staging/rdma/hfi1: Reduce snoop locking scope in IOCTL handler.

Thanks for the report,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/9] staging/rdma/hfi1: Implement Expected Receive TID caching

2015-11-09 Thread ira.weiny
On Fri, Nov 06, 2015 at 05:03:28PM -0800, Greg KH wrote:
> On Fri, Oct 30, 2015 at 06:58:45PM -0400, ira.we...@intel.com wrote:
> > From: Mitko Haralanov 
> > 
> > Expected receives work by user-space libraries (PSM) calling into the driver
> > with information about the user's receive buffer and have the driver DMA-map
> > that buffer and program the HFI to receive data directly into it.
> > 
> > This is an expensive operation as it requires the driver to pin the pages 
> > which
> > the user's buffer maps to, DMA-map them, and then program the HFI.
> > 
> > When the receive is complete, user-space libraries have to call into the 
> > driver
> > again so the buffer is removed from the HFI, un-mapped, and the pages 
> > unpinned.
> > 
> > All of these operations are expensive, considering that a lot of 
> > applications
> > (especially micro-benchmarks) use the same buffer over and over.
> > 
> > In order to get better performance for user-space applications, it is highly
> > beneficial that they don't continuously call into the driver to register and
> > unregister the same buffer. Rather, they can register the buffer and cache 
> > it
> > for future work. The buffer can be unregistered when it is freed by the 
> > user.
> > 
> > This change implements such buffer caching by making use of the kernel's MMU
> > notifier API. User-space libraries call into the driver only when the need 
> > to
> > register a new buffer.
> > 
> > Once a buffer is registered, it stays programmed into the HFI until the 
> > kernel
> > notifies the driver that the buffer has been freed by the user. At that 
> > time,
> > the user-space library is notified and it can do the necessary work to 
> > remove
> > the buffer from its cache.
> > 
> > Buffers which have been invalidated by the kernel are not automatically 
> > removed
> > from the HFI and do not have their pages unpinned. Buffers are only 
> > completely
> > removed when the user-space libraries call into the driver to free them.  
> > This
> > is done to ensure that any ongoing transfers into that buffer are complete.
> > This is important when a buffer is not completely freed but rather it is
> > shrunk. The user-space library could still have uncompleted transfers into 
> > the
> > remaining buffer.
> > 
> > With this feature, it is important that systems are setup with reasonable
> > limits for the amount of lockable memory.  Keeping the limit at "unlimited" 
> > (as
> > we've done up to this point), may result in jobs being killed by the 
> > kernel's
> > OOM due to them taking up excessive amounts of memory.
> > 
> > Reviewed-by: Arthur Kepner 
> > Reviewed-by: Dennis Dalessandro 
> > Signed-off-by: Mitko Haralanov 
> > Signed-off-by: Ira Weiny 
> > 
> > ---
> > Changes from V3:
> > Reworked based on the removal of the file pointer macros
> > Split out some prep patches and code clean up
> > 
> > Changes from V2:
> > Fix random Kconfig 0-day build error
> > Fix leak of random memory to user space caught by Dan Carpenter
> > Separate out pointer bug fix into a previous patch
> > Change error checks in case statement per Dan's comments
> > 
> >  drivers/staging/rdma/hfi1/file_ops.c | 469 ++---
> >  drivers/staging/rdma/hfi1/hfi.h  |  43 +-
> >  drivers/staging/rdma/hfi1/init.c |   5 +-
> >  drivers/staging/rdma/hfi1/trace.h| 132 +++--
> >  drivers/staging/rdma/hfi1/user_exp_rcv.c | 874 
> > ++-
> >  drivers/staging/rdma/hfi1/user_pages.c   | 110 +---
> >  drivers/staging/rdma/hfi1/user_sdma.c|  13 +
> >  include/uapi/rdma/hfi/hfi1_user.h|  14 +-
> >  8 files changed, 1069 insertions(+), 591 deletions(-)
> 
> This is still a really big patch, any chance you can break it up into
> smaller, reviewable parts?  I see you add different operations, perhaps
> break it up into one patch per logical thing?
> 

I understand, however, as you have seen with my attempt to break this up there
are issues if we do so.

I need to clean up the previous patch which was an attempt to split this one
up.  But at this point I would really like to preserve the functionality we
have here.  Breaking this up beyond this point is going to be difficult to do
and really will not allow for bisecting the code across this feature being
in vs out.

Thanks,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/9] staging/rdma/hfi1: Add function stubs for TID caching

2015-11-09 Thread ira.weiny
On Fri, Nov 06, 2015 at 05:02:35PM -0800, Greg KH wrote:
> On Fri, Oct 30, 2015 at 06:58:44PM -0400, ira.we...@intel.com wrote:
> > From: Mitko Haralanov 
> > 
> > Add mmu notify helper functions and TID caching function stubs in 
> > preparation
> > for the TID caching implementation.
> > 
> > TID caching makes use of the MMU notifier to allow the driver to respond to 
> > the
> > user freeing memory which is allocated to the HFI.
> > 
> > This patch implements the basic MMU notifier functions to insert, find and
> > remove buffer pages from memory based on the mmu_notifier being invoked.
> > 
> > In addition it places stubs in place for the main entry points by follow on
> > code.
> > 
> > Follow up patches will complete the implementation of the interaction with 
> > user
> > space and makes use of these functions.
> 
> This patch adds warnings to the build, which isn't ok, so I've stopped
> here in this series and just applied the first 4.  Please fix up and
> resend the remaining ones.

Ok, patch 7-9 of this series do not depend on this patch nor number 6.  I will
resend those 3 while I figure out what to do about these 2:

staging/rdma/hfi1: Add function stubs for TID caching
staging/rdma/hfi1: Implement Expected Receive TID caching

Frankly this was an attempt to reduce the size of "Implement Expected Receive
TID caching".  I obviously did something wrong.

I really don't know that I can split these up any more without causing issues
with bisecting the code.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] staging: rdma: hfi1 : Prefer using the BIT macro

2015-11-05 Thread ira.weiny
On Thu, Nov 05, 2015 at 05:28:03PM +0530, Sunny Kumar wrote:
> This patch replaces bit shifting on 1 with the BIT(x) macro
> 
> Signed-off-by: Sunny Kumar 

Also, NAK as has been covered in other responses.

However, I wanted to add, similar to the hfi1_ioctl fix, we have follow on
checkpatch patches which address this.

Ira

> ---
>  drivers/staging/rdma/hfi1/user_sdma.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/staging/rdma/hfi1/user_sdma.c 
> b/drivers/staging/rdma/hfi1/user_sdma.c
> index 36c838d..95a6d99 100644
> --- a/drivers/staging/rdma/hfi1/user_sdma.c
> +++ b/drivers/staging/rdma/hfi1/user_sdma.c
> @@ -146,8 +146,8 @@ MODULE_PARM_DESC(sdma_comp_size, "Size of User SDMA 
> completion ring. Default: 12
>  #define KDETH_OM_MAX_SIZE  (1 << ((KDETH_OM_LARGE / KDETH_OM_SMALL) + 1))
>  
>  /* Last packet in the request */
> -#define TXREQ_FLAGS_REQ_LAST_PKT   (1 << 0)
> -#define TXREQ_FLAGS_IOVEC_LAST_PKT (1 << 0)
> +#define TXREQ_FLAGS_REQ_LAST_PKT   BIT(1 << 0)
> +#define TXREQ_FLAGS_IOVEC_LAST_PKT BIT(1 << 0)
>  
>  #define SDMA_REQ_IN_USE 0
>  #define SDMA_REQ_FOR_THREAD 1
> @@ -156,9 +156,9 @@ MODULE_PARM_DESC(sdma_comp_size, "Size of User SDMA 
> completion ring. Default: 12
>  #define SDMA_REQ_HAS_ERROR  4
>  #define SDMA_REQ_DONE_ERROR 5
>  
> -#define SDMA_PKT_Q_INACTIVE (1 << 0)
> -#define SDMA_PKT_Q_ACTIVE   (1 << 1)
> -#define SDMA_PKT_Q_DEFERRED (1 << 2)
> +#define SDMA_PKT_Q_INACTIVE BIT(1 << 0)
> +#define SDMA_PKT_Q_ACTIVE   BIT(1 << 1)
> +#define SDMA_PKT_Q_DEFERRED BIT(1 << 2)
>  
>  /*
>   * Maximum retry attempts to submit a TX request
> -- 
> 2.1.4
> 
> ___
> devel mailing list
> de...@linuxdriverproject.org
> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] staging/rdma/hfi1: Method to toggle "fast ECN" detection

2015-11-05 Thread ira.weiny
On Thu, Nov 05, 2015 at 10:56:27AM +0300, Dan Carpenter wrote:
> On Wed, Nov 04, 2015 at 09:10:11PM -0500, ira.we...@intel.com wrote:
> > From: Vennila Megavannan 
> > 
> > Add a module paramter to toggle prescan/Fast ECN Detection.
> > 
> > In addition change the PRESCAN_RXQ Kconfig default to "yes".
> > 
> > Reviewed-by: Arthur Kepner
> > Reviewed-by: Mike Marciniszyn
> > Signed-off-by: Vennila Megavannan
> > Signed-off-by: Ira Weiny 
> 
> Hm...  In the original code we had the config entry but the code to
> support this was actually disabled.  Now we are turning on the config
> entry by default but the module parameter defaults to disabled.  I think
> the documentation is a bit tricky because it should say that actually
> enabling the config is not enough, it's still disabled.

We tried to make it clear but obviously missed the mark.

> 
> Do we really need the config to be there?  Distros are going to enable it.
> Who is it who wants to disable the config?

The config is maintained to be able to fully remove the code to obtain the
maximum receive performance _if_ the customer knows that they will never need
the prescanning.  This is the _ultimate_ in performance on this path.  We
anticipate some HPC customers who build their own kernels may want this option.

>
> Also is it better to
> default to off or on for this code, what are the upsides and downsides
> of that choice?
> 

I'll update the commit message.  But will also explain here.

Enabling the code with the config option introduces a _slight_ performance
impact but allows the user to determine if this congestion control fix should
be active or not at run time.  You are correct that most distros will enable
the config options (default Yes).  Therefore this becomes a system admin
control item for the fabric being configured.  After testing we found that this
was the best compromise for most systems in terms of performance and
flexibility.  This is because prescanning is not required most of the time.
Should a customer find that their topology and/or workload requires prescanning
they can then enable this option at run time.  This will result in a
significant reduction of performance at the node level but may for those
customers result in better overall fabric/cluster performance.

So in conclusion we have 3 performance levels and the combination in this patch
puts us at the "middle one".

Thanks,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/1] IB/sa: Put netlink request into the request list before sending

2015-10-30 Thread ira.weiny
On Fri, Oct 30, 2015 at 08:23:45AM -0400, kaike@intel.com wrote:
> From: Kaike Wan 
> 
> It was found by Saurabh Sengar that the netlink code tried to allocate
> memory with GFP_KERNEL while holding a spinlock. While it is possible
> to fix the issue by replacing GFP_KERNEL with GFP_ATOMIC, it is better
> to get rid of the spinlock while sending the packet. However, in order
> to protect against a race condition that a quick response may be received
> before the request is put on the request list, we need to put the request
> on the list first.
> 
> Signed-off-by: Kaike Wan 

Reviewed-by: Ira Weiny 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] IB/hfi1: might sleep under spinlock in hfi1_ioctl()

2015-10-30 Thread ira.weiny
On Sat, Oct 31, 2015 at 12:32:29AM +0300, Alexey Khoroshilov wrote:
> Hello,
> 
> hfi1_ioctl() contains many calls to might sleep functions with
> dd->hfi1_snoop.snoop_lock spinlock held (for example, access_ok,
> copy_from_user, kzalloc(GFP_KERNEL), etc.).
> 
> Should dd->hfi1_snoop.snoop_lock be acquired just before updating state?

I believe you are correct.

I am currently in the process of pushing fixes to the staging tree.

We have a patch which fixes this queued up but it depends on at least one other
patch in my queue.

I will do my best to get this submitted soon.

Thanks,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] IB/sa: Put netlink request into the request list before sending

2015-10-28 Thread ira.weiny
On Wed, Oct 28, 2015 at 09:44:27AM -0400, kaike@intel.com wrote:
> From: Kaike Wan 
> 
> It was found by Saurabh Sengar that the netlink code tried to allocate
> memory with GFP_KERNEL while holding a spinlock. While it is possible
> to fix the issue by replacing GFP_KERNEL with GFP_ATOMIC, it is better
> to get rid of the spinlock while sending the packet. However, in order
> to protect against a race condition that a quick response may be received
> before the request is put on the request list, we need to put the request
> on the list first.
> 
> Signed-off-by: Kaike Wan 
> ---
>  drivers/infiniband/core/sa_query.c |   22 --
>  1 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/core/sa_query.c 
> b/drivers/infiniband/core/sa_query.c
> index edcf568..240b57c 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -562,23 +562,25 @@ static int ib_nl_make_request(struct ib_sa_query *query)
>   INIT_LIST_HEAD(>list);
>   query->seq = (u32)atomic_inc_return(_nl_sa_request_seq);
>  
> + /* Put the request on the list first.*/
>   spin_lock_irqsave(_nl_request_lock, flags);
> + delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
> + query->timeout = delay + jiffies;
> + list_add_tail(>list, _nl_request_list);
> + spin_unlock_irqrestore(_nl_request_lock, flags);
> +
>   ret = ib_nl_send_msg(query);

What about using the gfp_mask through this stack?

I think you need to split ib_nl_send_msg into "create message" and "send
message".  Then don't add the message to the list unless it is ready to go.
Then you can get rid of the code below which is removing it on error.

> + spin_lock_irqsave(_nl_request_lock, flags);

Do we still need the spin lock?

>   if (ret <= 0) {
>   ret = -EIO;
> - goto request_out;
> + /* Remove the request */
> + list_del(>list);

Don't need to do this if we split ib_nl_send_msg.

Ira

>   } else {
>   ret = 0;
> + /* Start the timeout if this is the only request */
> + if (ib_nl_request_list.next == >list)
> + queue_delayed_work(ib_nl_wq, _nl_timed_work, delay);
>   }
> -
> - delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
> - query->timeout = delay + jiffies;
> - list_add_tail(>list, _nl_request_list);
> - /* Start the timeout if this is the only request */
> - if (ib_nl_request_list.next == >list)
> - queue_delayed_work(ib_nl_wq, _nl_timed_work, delay);
> -
> -request_out:
>   spin_unlock_irqrestore(_nl_request_lock, flags);
>  
>   return ret;
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 12/23] staging/rdma/hfi1: Macro code clean up

2015-10-28 Thread ira.weiny
> > Can I add the removal of these macros to the TODO list and get this patch
> > accepted in the interm?
> 
> Nope, sorry, why would I accept a known-problem patch?  Would you do
> such a thing?
> 
> > Many of the patches I am queueing up to submit as well as one in this 
> > series do
> > not apply cleanly without this change.  It will be much easier if I can get
> > everything applied and then do a global clean up of these macros after the
> > fact.
> 
> But you would have no incentive to do that if I take this patch now :)
> 

Understood.
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/sa: replace GFP_KERNEL with GFP_ATOMIC

2015-10-27 Thread ira.weiny
On Tue, Oct 27, 2015 at 09:17:40PM +0530, Saurabh Sengar wrote:
> replace GFP_KERNEL with GFP_ATOMIC, as code while holding a spinlock
> should be atomic
> GFP_KERNEL may sleep and can cause deadlock, where as GFP_ATOMIC may
> fail but certainly avoids deadlock

Great catch.  Thanks!

However, gfp_t is passed to send_mad and we should pass that down and use it.

Compile tested only, suggestion below,
Ira


14:09:12 > git di
diff --git a/drivers/infiniband/core/sa_query.c
b/drivers/infiniband/core/sa_query.c
index 8c014b33d8e0..54d454042b28 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -512,7 +512,7 @@ static int ib_nl_get_path_rec_attrs_len(ib_sa_comp_mask 
comp_mask)
return len;
 }
 
-static int ib_nl_send_msg(struct ib_sa_query *query)
+static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask)
 {
struct sk_buff *skb = NULL;
struct nlmsghdr *nlh;
@@ -526,7 +526,7 @@ static int ib_nl_send_msg(struct ib_sa_query *query)
if (len <= 0)
return -EMSGSIZE;
 
-   skb = nlmsg_new(len, GFP_KERNEL);
+   skb = nlmsg_new(len, gfp_mask);
if (!skb)
return -ENOMEM;
 
@@ -544,7 +544,7 @@ static int ib_nl_send_msg(struct ib_sa_query *query)
/* Repair the nlmsg header length */
nlmsg_end(skb, nlh);
 
-   ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, GFP_KERNEL);
+   ret = ibnl_multicast(skb, nlh, RDMA_NL_GROUP_LS, gfp_mask);
if (!ret)
ret = len;
else
@@ -553,7 +553,7 @@ static int ib_nl_send_msg(struct ib_sa_query *query)
return ret;
 }
 
-static int ib_nl_make_request(struct ib_sa_query *query)
+static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
 {
unsigned long flags;
unsigned long delay;
@@ -563,7 +563,7 @@ static int ib_nl_make_request(struct ib_sa_query *query)
query->seq = (u32)atomic_inc_return(_nl_sa_request_seq);
 
spin_lock_irqsave(_nl_request_lock, flags);
-   ret = ib_nl_send_msg(query);
+   ret = ib_nl_send_msg(query, gfp_mask);
if (ret <= 0) {
ret = -EIO;
goto request_out;
@@ -1105,7 +1105,7 @@ static int send_mad(struct ib_sa_query *query, int 
timeout_ms, gfp_t gfp_mask)
 
if (query->flags & IB_SA_ENABLE_LOCAL_SERVICE) {
if (!ibnl_chk_listeners(RDMA_NL_GROUP_LS)) {
-   if (!ib_nl_make_request(query))
+   if (!ib_nl_make_request(query, gfp_mask))
return id;
}
ib_sa_disable_local_svc(query);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/sa: replace GFP_KERNEL with GFP_ATOMIC

2015-10-27 Thread ira.weiny
On Tue, Oct 27, 2015 at 12:16:52PM -0600, Jason Gunthorpe wrote:
> On Tue, Oct 27, 2015 at 02:12:36PM -0400, ira.weiny wrote:
> > On Tue, Oct 27, 2015 at 09:17:40PM +0530, Saurabh Sengar wrote:
> > > replace GFP_KERNEL with GFP_ATOMIC, as code while holding a spinlock
> > > should be atomic
> > > GFP_KERNEL may sleep and can cause deadlock, where as GFP_ATOMIC may
> > > fail but certainly avoids deadlock
> > 
> > Great catch.  Thanks!
> > 
> > However, gfp_t is passed to send_mad and we should pass that down and use 
> > it.
> 
> > spin_lock_irqsave(_nl_request_lock, flags);
> > -   ret = ib_nl_send_msg(query);
> > +   ret = ib_nl_send_msg(query, gfp_mask);
> 
> A spin lock is guarenteed held around ib_nl_send_msg, so it's
> allocations have to be atomic, can't use gfp_mask here..
> 
> I do wonder if it is a good idea to call ib_nl_send_msg with a spinlock
> held though.. Would be nice to see that go away.

Ah, yea my bad.

Ira

> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 12/23] staging/rdma/hfi1: Macro code clean up

2015-10-27 Thread ira.weiny
On Tue, Oct 27, 2015 at 05:19:10PM +0900, Greg KH wrote:
> On Mon, Oct 26, 2015 at 10:28:38AM -0400, ira.we...@intel.com wrote:
> > From: Mitko Haralanov 
> > 
> > Clean up the context and sdma macros and move them to a more logical place 
> > in
> > hfi.h
> > 
> > Signed-off-by: Mitko Haralanov 
> > Signed-off-by: Ira Weiny 
> > ---
> >  drivers/staging/rdma/hfi1/hfi.h | 22 ++
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/staging/rdma/hfi1/hfi.h 
> > b/drivers/staging/rdma/hfi1/hfi.h
> > index a35213e9b500..41ad9a30149b 100644
> > --- a/drivers/staging/rdma/hfi1/hfi.h
> > +++ b/drivers/staging/rdma/hfi1/hfi.h
> > @@ -1104,6 +1104,16 @@ struct hfi1_filedata {
> > int rec_cpu_num;
> >  };
> >  
> > +/* for use in system calls, where we want to know device type, etc. */
> > +#define fp_to_fd(fp) ((struct hfi1_filedata *)(fp)->private_data)
> > +#define ctxt_fp(fp) (fp_to_fd((fp))->uctxt)
> > +#define subctxt_fp(fp) (fp_to_fd((fp))->subctxt)
> > +#define tidcursor_fp(fp) (fp_to_fd((fp))->tidcursor)
> > +#define user_sdma_pkt_fp(fp) (fp_to_fd((fp))->pq)
> > +#define user_sdma_comp_fp(fp) (fp_to_fd((fp))->cq)
> > +#define notifier_fp(fp) (fp_to_fd((fp))->mn)
> > +#define rb_fp(fp) (fp_to_fd((fp))->tid_rb_root)
> 
> Ick, no, don't do this, just spell it all out (odds are you will see tht
> you can make the code simpler...)  If you don't know what "cq" or "pq"
> are, then name them properly.
> 
> These need to be all removed.

Ok.

Can I add the removal of these macros to the TODO list and get this patch
accepted in the interm?

Many of the patches I am queueing up to submit as well as one in this series do
not apply cleanly without this change.  It will be much easier if I can get
everything applied and then do a global clean up of these macros after the
fact.

Thanks,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 23/23] staging/rdma/hfi1: Update driver version string to 0.9-294

2015-10-27 Thread ira.weiny
On Tue, Oct 27, 2015 at 05:46:41PM +0900, Greg KH wrote:
> On Mon, Oct 26, 2015 at 10:28:49AM -0400, ira.we...@intel.com wrote:
> > From: Jubin John 
> > 
> > Signed-off-by: Jubin John 
> > Signed-off-by: Ira Weiny 
> > ---
> >  drivers/staging/rdma/hfi1/common.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/staging/rdma/hfi1/common.h 
> > b/drivers/staging/rdma/hfi1/common.h
> > index 7809093eb55e..5dd92720faae 100644
> > --- a/drivers/staging/rdma/hfi1/common.h
> > +++ b/drivers/staging/rdma/hfi1/common.h
> > @@ -205,7 +205,7 @@
> >   * to the driver itself, not the software interfaces it supports.
> >   */
> >  #ifndef HFI1_DRIVER_VERSION_BASE
> > -#define HFI1_DRIVER_VERSION_BASE "0.9-248"
> > +#define HFI1_DRIVER_VERSION_BASE "0.9-294"
> 
> Patches like this make no sense at all, please drop it and only use the
> kernel version.

What do you mean by "only use the kernel version"?  Do you mean

#define HFI1_DRIVER_VERSION_BASE UTS_RELEASE
 
Or just remove the macro entirely?

>
> Trust me, it's going to get messy really fast (hint, it
> already did...)

Did I base this on the wrong tree?  Not sure how this could have messed you up.

Thanks,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/22] staging/rdma/hfi1: Fix regression in send performance

2015-10-25 Thread ira.weiny
On Wed, Oct 21, 2015 at 04:18:06PM +0300, Dan Carpenter wrote:
> On Mon, Oct 19, 2015 at 10:11:16PM -0400, ira.we...@intel.com wrote:
> > From: Mike Marciniszyn 
> > 
> > This additional call is a regression from qib.  For small messages the 
> > progress
> > routine always builds one and clears out the ahg state when the queue has 
> > gone
> > to empty which is the predominant case for small messages.
> > 
> > Inline the routine to mitigate the call and move the routine to qp.h for 
> > scope
> > reasons.
> > 
> > Reviewed-by: Dennis Dalessandro 
> > Signed-off-by: Mike Marciniszyn 
> > Signed-off-by: Ira Weiny 
> > ---
> 
> The whole point of this patch is to do:
> 
> > -   if (qp->s_sde)
> > +   if (qp->s_sde && qp->s_ahgidx >= 0)
> 
> It was not at all clear from the changelog and it was difficult to tell
> because we moved code around as well.

I've cleaned up the commit message.

v3 should be on its way after I make sure I have addressed all your comments on
all the patches.

Ira

> 
> regards,
> dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/22] staging/rdma/hfi1: Allow tuning of SDMA interrupt rate

2015-10-22 Thread ira.weiny
On Thu, Oct 22, 2015 at 01:54:21PM +0300, Dan Carpenter wrote:
> What values work well for this parameter?

64.

For v3, I've squashed a change which was farther down my list of changes
which set this after some testing.

Thanks,
Ira

> 
> regards,
> dan carpenter
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 14/22] staging/rdma/hfi1: Implement Expected Receive TID caching

2015-10-22 Thread ira.weiny
On Thu, Oct 22, 2015 at 01:41:58PM +0300, Dan Carpenter wrote:
> On Mon, Oct 19, 2015 at 10:11:29PM -0400, ira.we...@intel.com wrote:
> > +   case HFI1_CMD_TID_INVAL_READ:
> > +   ret = hfi1_user_exp_rcv_invalid(fp, );
> > +   if (!ret) {
> > +   addr = (unsigned long)cmd.addr +
> > +   offsetof(struct hfi1_tid_info, tidcnt);
> > +   if (copy_to_user((void __user *)addr, ,
> > +sizeof(tinfo.tidcnt)))
> > +   ret = -EFAULT;
> > +   }
> > +   break;
> 
> This switch statement uses success handling throughtout instead of
> error handling.  It's better if you write it like this:
> 
>   case HFI1_CMD_TID_INVAL_READ:
>   ret = hfi1_user_exp_rcv_invalid(fp, );
>   if (ret)
>   break;
> 
>   addr = (unsigned long)cmd.addr +
>  offsetof(struct hfi1_tid_info, tidcnt);
>   if (copy_to_user((void __user *)addr, ,
>sizeof(tinfo.tidcnt)))
>   ret = -EFAULT;
>   break;

This follows the rest of the style of the case statement in this function.  We
prefer to leave this as is for a number of reasons.

1) This is consistent with the coding style elsewhere in this driver.
2) It is functionally equivalent.
3) I have a long list of patches which need to be processed and this may cause
   later merge conflicts.

The main reason is number 1.  We want to remain consistent within this driver.

> 
> 
> The casting is sort of ugly...  It would be better to make address a
> pointer.  It does cut down on the lines of code but at least the cast is
> all done at once and really "addr" is actually a pointer.
> 
>   case HFI1_CMD_TID_INVAL_READ:
>   ret = hfi1_user_exp_rcv_invalid(fp, );
>   if (ret)
>   break;
> 
>   addr = (void __user *)(unsigned long)cmd.addr +
>  offsetof(struct hfi1_tid_info, tidcnt);
>   if (copy_to_user(addr, , sizeof(tinfo.tidcnt)))
>   ret = -EFAULT;
>   break;

I think this is more a matter of style.

> 
> 
> > case HFI1_CMD_TID_FREE:
> > -   ret = exp_tid_free(fp, );
> > +   ret = hfi1_user_exp_rcv_clear(fp, );
> > +   addr = (unsigned long)cmd.addr +
> > +   offsetof(struct hfi1_tid_info, tidcnt);
> > +   if (copy_to_user((void __user *)addr, ,
> > +sizeof(tinfo.tidcnt)))
> > +   ret = -EFAULT;
> > break;
> 
> This is an information leak.  We should break if
> hfi1_user_exp_rcv_clear() fails, but instead we copy uninitialized
> variables to the user.

That is a bug, thanks.

Fixed in v3  (Although, I did use the same positive error check style to be
consistent within this function.)

> 
> 
> 
> > case HFI1_CMD_RECV_CTRL:
> > ret = manage_rcvq(uctxt, subctxt_fp(fp), (int)user_val);
> > @@ -607,9 +603,9 @@ static int hfi1_file_mmap(struct file *fp, struct 
> > vm_area_struct *vma)
> >  * Use the page where this context's flags are. User level
> >  * knows where it's own bitmap is within the page.
> >  */
> > -   memaddr = ((unsigned long)dd->events +
> > -  ((uctxt->ctxt - dd->first_user_ctxt) *
> > -   HFI1_MAX_SHARED_CTXTS)) & PAGE_MASK;
> > +   memaddr = (unsigned long)(dd->events +
> > + ((uctxt->ctxt - dd->first_user_ctxt) *
> > +  HFI1_MAX_SHARED_CTXTS)) & PAGE_MASK;
> 
> I am too lazy to investigate the types of all these variables but I'm
> instead going to assert that moving the cast to later is an unrelated
> white space change.  Don't mix white space changes in with a behavior
> change patch because it makes it hard to review.

Actually this is a bug fix.  dd->events is a pointer which needs to be indexed
prior to the cast.  I'll split this into a separate patch in v3.

Thanks!
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] Update driver to 0.9-294

2015-10-20 Thread ira.weiny
On Tue, Oct 20, 2015 at 03:45:47PM +0300, Moni Shoua wrote:
> >
> > Perhaps I did not chose my words carefully enough.
> >
> > The largest issue on the TODO list is the refactoring of the code to be
> > shared between the hfi1 and qib driver.  While the hardware between hfi1
> > and qib is similar and thus the initial code looked similar, our
> > performance tuning on hfi1 has revealed that some changes will be required
> > to the hfi1 code to fully utilize the hardware.  We would like to get these
> > updates in to make sure that the refactoring work is done properly for the
> > 2 hardware types.
>
> Please keep in mind that there are 3 HW types (our SoftRoCE driver)
> that needs to be part of the framework.

Understood, however, unlike SoftRoCE, qib and hfi currently share a lot of code
to drive the hardware.

The underlying reason for the TODO item "Remove software processing of IB
protocol..." is because we have a large amount of duplicated code between these
drivers.  _Some_ of which, at a high level, is sharable with SoftRoCE.

These patches (and more to follow), further differentiate the 2 drivers along
hardware lines.  Accepting these patches will help us make sure that we don't
create some common code between qib and hfi which, because of our testing we
now know, needs to be separated out.

This is a separate issue from the higher level code sharing which will be done
with SoftRoCE.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH infiniband-diags] ibcacheedit.c: Eliminate unused parameter passed to update_switchportguids

2015-10-13 Thread ira.weiny
On Tue, Oct 13, 2015 at 08:38:02AM -0400, Hal Rosenstock wrote:
> 
> src/ibcacheedit.c: In function ?update_switchportguids?:
> src/ibcacheedit.c:168: warning: unused parameter ?guid?
> 
> Signed-off-by: Hal Rosenstock 

Thanks applied,
Ira

> ---
> diff --git a/src/ibcacheedit.c b/src/ibcacheedit.c
> index 983c0ab..b520839 100644
> --- a/src/ibcacheedit.c
> +++ b/src/ibcacheedit.c
> @@ -165,7 +165,7 @@ static int process_opt(void *context, int ch, char 
> *optarg)
>   return 0;
>  }
>  
> -static void update_switchportguids(ibnd_node_t *node, uint64_t guid)
> +static void update_switchportguids(ibnd_node_t *node)
>  {
>   ibnd_port_t *port;
>   int p;
> @@ -191,7 +191,7 @@ static void replace_node_guid(ibnd_node_t *node, void 
> *user_data)
>* switches, so update port guids too
>*/
>   if (node->type == IB_NODE_SWITCH)
> - update_switchportguids(node, guids->after);
> + update_switchportguids(node);
>  
>   guids->found++;
>   }
> @@ -229,7 +229,7 @@ static void replace_portguid(ibnd_node_t *node, void 
> *user_data)
>*/
>   if (node->guid == guids->before) {
>   node->guid = guids->after;
> - update_switchportguids(node, guids->after);
> + update_switchportguids(node);
>   guids->found++;
>   }
>   }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH infiniband-diags] ibqueryerrors.c: Removed unused passed parameters to print_port_config and query_and_dump

2015-10-13 Thread ira.weiny
On Tue, Oct 13, 2015 at 08:38:20AM -0400, Hal Rosenstock wrote:
> 
> src/ibqueryerrors.c: In function ?print_port_config?:
> src/ibqueryerrors.c:176: warning: unused parameter ?node_name?
> src/ibqueryerrors.c: In function ?query_and_dump?:
> src/ibqueryerrors.c:367: warning: unused parameter ?node?
> 
> Signed-off-by: Hal Rosenstock 

Thanks applied,
Ira

> ---
> diff --git a/src/ibqueryerrors.c b/src/ibqueryerrors.c
> index 40a8ad1..304dc15 100644
> --- a/src/ibqueryerrors.c
> +++ b/src/ibqueryerrors.c
> @@ -173,7 +173,7 @@ static int exceeds_threshold(int field, unsigned val)
>   return (val > thres);
>  }
>  
> -static void print_port_config(char *node_name, ibnd_node_t * node, int 
> portnum)
> +static void print_port_config(ibnd_node_t * node, int portnum)
>  {
>   char width[64], speed[64], state[64], physstate[64];
>   char remote_str[256];
> @@ -364,7 +364,7 @@ Exit:
>  }
>  
>  static int query_and_dump(char *buf, size_t size, ib_portid_t * portid,
> -   ibnd_node_t * node, char *node_name, int portnum,
> +   char *node_name, int portnum,
> const char *attr_name, uint16_t attr_id,
> int start_field, int end_field)
>  {
> @@ -418,7 +418,7 @@ static int print_results(ib_portid_t * portid, char 
> *node_name,
>   /* If there are PortXmitDiscards, get details (if 
> supported) */
>   if (i == IB_PC_XMT_DISCARDS_F && details) {
>   n += query_and_dump(str + n, sizeof(buf) - n, 
> portid,
> - node, node_name, portnum,
> + node_name, portnum,
>   "PortXmitDiscardDetails",
>   
> IB_GSI_PORT_XMIT_DISCARD_DETAILS,
>   IB_PC_RCV_LOCAL_PHY_ERR_F,
> @@ -426,7 +426,7 @@ static int print_results(ib_portid_t * portid, char 
> *node_name,
>   /* If there are PortRcvErrors, get details (if 
> supported) */
>   } else if (i == IB_PC_ERR_RCV_F && details) {
>   n += query_and_dump(str + n, sizeof(buf) - n, 
> portid,
> - node, node_name, portnum,
> + node_name, portnum,
>   "PortRcvErrorDetails",
>   
> IB_GSI_PORT_RCV_ERROR_DETAILS,
>   IB_PC_XMT_INACT_DISC_F,
> @@ -499,7 +499,7 @@ static int print_results(ib_portid_t * portid, char 
> *node_name,
>   printf("   GUID 0x%" PRIx64 " port %d:%s\n",
>  node->ports[portnum]->guid, portnum, str);
>   if (port_config)
> - print_port_config(node_name, node, portnum);
> + print_port_config(node, portnum);
>   summary.bad_ports++;
>   }
>   }
> @@ -596,7 +596,7 @@ static int print_data_cnts(ib_portid_t * portid, uint16_t 
> cap_mask,
>   printf("\n");
>  
>   if (portnum != 0xFF && port_config)
> - print_port_config(node_name, node, portnum);
> + print_port_config(node, portnum);
>  
>   return (0);
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH TRIVIAL infiniband-diags] ibdiag_common.c: Move static to beginning of get_build_version declaration

2015-10-12 Thread ira.weiny
On Mon, Oct 12, 2015 at 08:31:23AM -0400, Hal Rosenstock wrote:
> 
> Eliminate compiler warning:
> src/ibdiag_common.c:85: warning: ?static? is not at beginning of declaration
> 
> Signed-off-by: Hal Rosenstock 

Thanks applied,
Ira

> ---
> diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c
> index e09623d..5424845 100644
> --- a/src/ibdiag_common.c
> +++ b/src/ibdiag_common.c
> @@ -82,7 +82,7 @@ static const char **prog_examples;
>  static struct option *long_opts = NULL;
>  static const struct ibdiag_opt *opts_map[256];
>  
> -const static char *get_build_version(void)
> +static const char *get_build_version(void)
>  {
>   return "BUILD VERSION: " IBDIAG_VERSION " Build date: " __DATE__ " "
>   __TIME__;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH infiniband-diags] perfquery.c: Fix smp_query_via return value checks

2015-10-12 Thread ira.weiny
On Mon, Oct 12, 2015 at 08:30:31AM -0400, Hal Rosenstock wrote:
> 
> smp_query_via returns pointer so < 0 comparison is wrong:
> src/perfquery.c: In function ?is_rsfec_mode_active?:
> src/perfquery.c:481: warning: ordered comparison of pointer with integer zero
> src/perfquery.c: In function ?main?:
> src/perfquery.c:919: warning: ordered comparison of pointer with integer zero
> src/perfquery.c:928: warning: ordered comparison of pointer with integer zero
>  
> Reported-by: David Binderman  
> Signed-off-by: Hal Rosenstock 

There is also a call in ibdiag_common.c which is wrong:

diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c
index 54248455bac4..5ec0167f87de 100644
--- a/src/ibdiag_common.c
+++ b/src/ibdiag_common.c
@@ -507,7 +507,7 @@ int is_port_info_extended_supported(ib_portid_t * dest, int
port,
uint32_t cap_mask;
uint16_t cap_mask2;
 
-   if (smp_query_via(data, dest, IB_ATTR_PORT_INFO, port, 0, srcport) < 0)
+   if (!smp_query_via(data, dest, IB_ATTR_PORT_INFO, port, 0, srcport))
IBEXIT("port info query failed");
 
mad_decode_field(data, IB_PORT_CAPMASK_F, _mask);


I went ahead and added this chunk to this patch and accepted.

Thanks,
Ira


> ---
> Fix for OFA Bugzilla #2572
> 
> diff --git a/src/perfquery.c b/src/perfquery.c
> index 9e3a307..948ce52 100644
> --- a/src/perfquery.c
> +++ b/src/perfquery.c
> @@ -477,8 +477,8 @@ static uint8_t is_rsfec_mode_active(ib_portid_t * portid, 
> int port,
>   return 0;
>   }
>  
> - if (smp_query_via(data, portid, IB_ATTR_PORT_INFO_EXT, port, 0,
> -   srcport) < 0)
> + if (!smp_query_via(data, portid, IB_ATTR_PORT_INFO_EXT, port, 0,
> +srcport))
>   IBEXIT("smp query portinfo extended failed");
>  
>   mad_decode_field(data, IB_PORT_EXT_CAPMASK_F, _capmask);
> @@ -915,8 +915,8 @@ int main(int argc, char **argv)
>  
>  
>   if (all_ports_loop || (loop_ports && (all_ports || port == ALL_PORTS))) 
> {
> - if (smp_query_via(data, , IB_ATTR_NODE_INFO, 0, 0,
> -   srcport) < 0)
> + if (!smp_query_via(data, , IB_ATTR_NODE_INFO, 0, 0,
> +srcport))
>   IBEXIT("smp query nodeinfo failed");
>   node_type = mad_get_field(data, 0, IB_NODE_TYPE_F);
>   mad_decode_field(data, IB_NODE_NPORTS_F, _ports);
> @@ -924,8 +924,8 @@ int main(int argc, char **argv)
>   IBEXIT("smp query nodeinfo: num ports invalid");
>  
>   if (node_type == IB_NODE_SWITCH) {
> - if (smp_query_via(data, , IB_ATTR_SWITCH_INFO,
> -   0, 0, srcport) < 0)
> + if (!smp_query_via(data, , IB_ATTR_SWITCH_INFO,
> +0, 0, srcport))
>   IBEXIT("smp query nodeinfo failed");
>   enhancedport0 =
>   mad_get_field(data, 0, IB_SW_ENHANCED_PORT0_F);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] infiniband-diags 1.6.6 release

2015-10-12 Thread ira.weiny
There is a new release of infiniband-diags at:

https://www.openfabrics.org/downloads/management/infiniband-diags-1.6.6.tar.gz

md5sum:

b855ca3b98afefc2ad6a2de378ab71dd  infiniband-diags-1.6.6.tar.gz


Dependencies:

1) libibmad >= 1.3.12
2) libibumad >= 1.3.7
3) opensm-libs >= 3.3.10
4) ib_umad kernel module
5) glib2


Release notes v1.6.5 => 1.6.6

   1) bug fixes


Authors since 1.6.5

Ana Guerrero López (1):
  rdma-ndd: fix compiler warnings.

Dan Ben Yosef (2):
  perfquery -T (print Extended Speed Counters) times out on nodes supporting
  libibnetdisc: Avoid pushing same pointer to the hash table

Hal Rosenstock (5):
  perfquery.c: Change format of capability mask in IBWARN for consistency
  ibdiag_sa.c: In sa_get_handle, handle umad_open_port and umad_register fai
  iblinkinfo.c: Close additional file descriptor in advance
  ibdiag_common.c: Move static to beginning of get_build_version declaration
  perfquery.c: Fix smp_query_via return value checks

Ira Weiny (3):
  infiniband-diags/rdma-ndd: Fix issues with install
  infiniband-diags/rdma-ndd: add --pidfile option
  infiniband-diags: rdma-ndd: remove udev logging when not supported

Michal Schmidt (2):
  rdma-ndd: never use udev_get_sys_path()
  build-sys: avoid overlinking to libudev

Vladimir Koushnir (10):
  ibqueryerrors: Resource leak in path_record_query
  Remove redundant umad file descriptor from libibnetdisc
  query_smp.c: Avoid busy looping in process_one_recv
  dump_fts: Open global file descriptor after calling ibnd_discover_fabric
  ibqueryerrors: code improvement
  ibqueryerrors: Fix crash when no SM is running
  ibqueryerrors: Close global file descriptor before running ibnd_discover_f
  ibqueryerrors: improve code related to DR option
  vendstat: mad_rpc_close_port not called in corner cases
  saquery.c: Fix saquery -D option


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH infiniband-diags] perfquery: Fix timeout on nodes supporting RS_FEC capability

2015-09-19 Thread ira.weiny
On Tue, Sep 01, 2015 at 08:28:38AM -0400, Hal Rosenstock wrote:
> From 0e8341980ca7133fe0a472fddc1c622f766b8f8e Mon Sep 17 00:00:00 2001
> From: Dan Ben Yosef 
> Date: Mon, 31 Aug 2015 10:49:04 +0300
> 
> perfquery -T (print Extended Speed Counters) times out on nodes
> supporting RS_FEC capability.
> 
> Before sending ExtendedSpeedCounters PerfMgt GMP, perfquery sends
> PortInfoExtended SMP to get CapMask and FECModeActive fields.
> Upon reception of PortInfoExtended SMP, fec_mode_active field is
> erroneously decoded to 16 bit buffer instead of 32 bit buffer.
> As a result, memory corruption occurs and portid.dlid field is set to 0,
> so that ExtendedSpeedCounters GMP is sent to wrong destination.
> 
> Signed-off-by: Dan Ben Yosef 
> Signed-off-by: Hal Rosenstock 

Thanks applied,
Ira

> ---
>  src/perfquery.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/perfquery.c b/src/perfquery.c
> index 46451ef..9e3a307 100644
> --- a/src/perfquery.c
> +++ b/src/perfquery.c
> @@ -469,7 +469,7 @@ static uint8_t is_rsfec_mode_active(ib_portid_t * portid, 
> int port,
> uint16_t cap_mask)
>  {
>   uint8_t data[IB_SMP_DATA_SIZE] = { 0 };
> - uint16_t fec_mode_active = 0;
> + uint32_t fec_mode_active = 0;
>   uint32_t pie_capmask = 0;
>   if (cap_mask & IS_PM_RSFEC_COUNTERS_SUP) {
>   if (!is_port_info_extended_supported(portid, port, srcport)) {
> @@ -486,7 +486,7 @@ static uint8_t is_rsfec_mode_active(ib_portid_t * portid, 
> int port,
>_mode_active);
>   if((pie_capmask &
>   CL_NTOH32(IB_PORT_EXT_CAP_IS_FEC_MODE_SUPPORTED)) &&
> -(CL_NTOH16(IB_PORT_EXT_RS_FEC_MODE_ACTIVE) == 
> fec_mode_active))
> +(CL_NTOH16(IB_PORT_EXT_RS_FEC_MODE_ACTIVE) == 
> (fec_mode_active & 0x)))
>   return 1;
>   }
>  
> -- 
> 1.7.8.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH infiniband-diags] saquery.c: Fix saquery -D option

2015-09-19 Thread ira.weiny
On Thu, Sep 10, 2015 at 09:09:57AM -0400, Hal Rosenstock wrote:
> From: Vladimir Koushnir 
> Date: Wed, 9 Sep 2015 14:29:38 +0300
> 
> -D and -list saquery options are operational only
> when -N option is explicitly invoked (default option)
> 
> This patch allows using -D or --list options when no other option is
> invoked.
> 
> Signed-off-by: Vladimir Koushnir 
> Signed-off-by: Hal Rosenstock 

Thanks applied,
Ira

> ---
>  src/saquery.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/src/saquery.c b/src/saquery.c
> index 4343e33..db863f2 100644
> --- a/src/saquery.c
> +++ b/src/saquery.c
> @@ -1591,6 +1591,7 @@ static int process_opt(void *context, int ch, char 
> *optarg)
>   break;
>   case 'D':
>   node_print_desc = ALL_DESC;
> + command = SAQUERY_CMD_NODE_RECORD;
>   break;
>   case 'c':
>   command = SAQUERY_CMD_CLASS_PORT_INFO;
> -- 
> 1.7.8.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/49] IB/core: Add header definitions

2015-08-31 Thread ira.weiny
Hal,

I'm working on a couple of patches to address these comments.  I will be
submitting them in the next day or so.

On Wed, Jun 17, 2015 at 10:12:41AM -0400, Hal Rosenstock wrote:
> On 6/17/2015 8:28 AM, Mike Marciniszyn wrote:
> > From: Ira Weiny 
> > 
> > Add common OPA header definitions for driver
> > build:
> > - opa_port_info.h
> > - opa_smi.h
> > - hfi1_user.sh
> > 
> > Additionally, ib_mad.h, has additional definitions
> > that are common to ib_drivers including:
> > - trap support
> > - cca support
> > 
> > The qib driver has the duplication removed in favor
> > those in ib_mad.h
> > 
> > Reviewed-by: Mike Marciniszyn 
> > Reviewed-by: John, Jubin 
> > Signed-off-by: Ira Weiny 
> > ---
> >  drivers/infiniband/hw/qib/qib_mad.h |  147 +---
> >  include/rdma/ib_mad.h   |  138 +++
> >  include/rdma/opa_port_info.h|  433 
> > +++
> 
> Should opa_port_info.h be in include/rdma or in drivers/infiniband/hw/hfi1 ?

This file and opa_smi.h were placed here following the pattern of the same
ib_*.h files.  Indeed because there is currently only 1 OPA driver it could be
moved to the hfi1 driver directory.  However, I prefer to keep them in rdma.
If Doug prefers I can send a patch to move them.



> > +
> > +/*
> > + * Generic trap/notice producers
> > + */
> > +#define IB_NOTICE_PROD_CA  cpu_to_be16(1)
> > +#define IB_NOTICE_PROD_SWITCH  cpu_to_be16(2)
> > +#define IB_NOTICE_PROD_ROUTER  cpu_to_be16(3)
> > +#define IB_NOTICE_PROD_CLASS_MGR   cpu_to_be16(4)
> > +
> > +/*
> > + * Generic trap/notice numbers
> 
> SM Class trap/notice numbers
> 
> As such, should they be in ib_smi.h rather than ib_mad.h ?

Fixed in my patch series.

> 
> > + */
> > +#define IB_NOTICE_TRAP_LLI_THRESH  cpu_to_be16(129)
> > +#define IB_NOTICE_TRAP_EBO_THRESH  cpu_to_be16(130)
> > +#define IB_NOTICE_TRAP_FLOW_UPDATE cpu_to_be16(131)
> > +#define IB_NOTICE_TRAP_CAP_MASK_CHGcpu_to_be16(144)
> > +#define IB_NOTICE_TRAP_SYS_GUID_CHGcpu_to_be16(145)
> > +#define IB_NOTICE_TRAP_BAD_MKEYcpu_to_be16(256)
> > +#define IB_NOTICE_TRAP_BAD_PKEYcpu_to_be16(257)
> > +#define IB_NOTICE_TRAP_BAD_QKEYcpu_to_be16(258)
> > +
> > +/*
> > + * Repress trap/notice flags
> > + */
> > +#define IB_NOTICE_REPRESS_LLI_THRESH   (1 << 0)
> > +#define IB_NOTICE_REPRESS_EBO_THRESH   (1 << 1)
> > +#define IB_NOTICE_REPRESS_FLOW_UPDATE  (1 << 2)
> > +#define IB_NOTICE_REPRESS_CAP_MASK_CHG (1 << 3)
> > +#define IB_NOTICE_REPRESS_SYS_GUID_CHG (1 << 4)
> > +#define IB_NOTICE_REPRESS_BAD_MKEY (1 << 5)
> > +#define IB_NOTICE_REPRESS_BAD_PKEY (1 << 6)
> > +#define IB_NOTICE_REPRESS_BAD_QKEY (1 << 7)
> 
> What does this correspond to ? Is this some standard thing or are these
> defines driver specific ?
>

Fixed in my patch series.

> 
> > +
> > +/*
> > + * Generic trap/notice other local changes flags (trap 144).
> 
> SM Class trap/notice other local changes flags (trap 144)
> 
> As such, should they be in ib_smi.h rather than ib_mad.h ?

Fixed in my patch series.

> 
> > + */
> > +#define IB_NOTICE_TRAP_LSE_CHG 0x04/* Link Speed Enable 
> > changed */
> > +#define IB_NOTICE_TRAP_LWE_CHG 0x02/* Link Width Enable 
> > changed */
> > +#define IB_NOTICE_TRAP_NODE_DESC_CHG   0x01
> > +
> > +/*
> > + * Generic trap/notice M_Key volation flags in dr_trunc_hop (trap 256).
> 
> SM Class trap/notice M_Key violation flags in dr_trunc_hop (trap 256)
> 
> As such, should they be in ib_smi.h rather than ib_mad.h ?

Fixed in my patch series.

> 
> > + */
> > +#define IB_NOTICE_TRAP_DR_NOTICE   0x80
> > +#define IB_NOTICE_TRAP_DR_TRUNC0x40
> > +
> >  enum {
> > IB_MGMT_MAD_HDR = 24,
> > IB_MGMT_MAD_DATA = 232,
> > @@ -240,6 +294,90 @@ struct ib_class_port_info {
> > __be32  trap_qkey;
> >  };
> >  
> > +struct ib_node_info {
> > +   u8 base_version;
> > +   u8 class_version;
> > +   u8 node_type;
> > +   u8 num_ports;
> > +   __be64 sys_guid;
> > +   __be64 node_guid;
> > +   __be64 port_guid;
> > +   __be16 partition_cap;
> > +   __be16 device_id;
> > +   __be32 revision;
> > +   u8 local_port_num;
> > +   u8 vendor_id[3];
> > +} __packed;
> 
> This is SM attribute. Should it go into ib_smi.h like ib_port_info ?

Fixed in my patch series.

> 
> > +
> > +struct ib_mad_notice_attr {
> > +   u8 generic_type;
> > +   u8 prod_type_msb;
> > +   __be16 prod_type_lsb;
> > +   __be16 trap_num;
> > +   __be16 issuer_lid;
> > +   __be16 toggle_count;
> > +
> > +   union {
> > +   struct {
> > +   u8  details[54];
> > +   } raw_data;
> > +
> > +   struct {
> > +   __be16  reserved;
> > +   __be16  lid;/* where violation happened */
> > +   u8  port_num;

Re: [patch] IB/core: missing curly braces in ib_find_gid()

2015-08-28 Thread ira.weiny
On Tue, Aug 18, 2015 at 12:22:10PM +0300, Dan Carpenter wrote:
 Smatch says that, based on the indenting, we should probably add curly
 braces here.
 
 Fixes: 230145ff8124 ('IB/core: Add RoCE GID table management')
 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com


Reviewed-by: Ira Weiny ira.we...@intel.com

 
 diff --git a/drivers/infiniband/core/device.c 
 b/drivers/infiniband/core/device.c
 index 258b3f7..5d5bbae 100644
 --- a/drivers/infiniband/core/device.c
 +++ b/drivers/infiniband/core/device.c
 @@ -807,9 +807,10 @@ int ib_find_gid(struct ib_device *device, union ib_gid 
 *gid,
   for (port = rdma_start_port(device); port = rdma_end_port(device); 
 ++port) {
   if (rdma_cap_roce_gid_table(device, port)) {
   if (!ib_cache_gid_find_by_port(device, gid, port,
 -NULL, index))
 +NULL, index)) {
   *port_num = port;
   return 0;
 + }
   }
  
   for (i = 0; i  device-port_immutable[port].gid_tbl_len; ++i) {
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] IB/core: off by one in error handling

2015-08-28 Thread ira.weiny
On Tue, Aug 18, 2015 at 12:23:17PM +0300, Dan Carpenter wrote:
 This is a zero offset array.  The current code could try to free random
 memory and crash.  Also it leaks the first element.
 
 Fixes: 230145ff8124 ('IB/core: Add RoCE GID table management')
 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

I don't actually see this in Dougs to-be-rebased/for-4.3 tree.

Looks like Doug picked up a different version of the patch in the latest
rebase.

annotating cache.c I see a different change from Matan in commit

76680c1cfc5ab

+rollback_table_setup:
+   for (port = 0; port  ib_dev-phys_port_cnt; port++) {
+   cleanup_gid_table_port(ib_dev, port + rdma_start_port(ib_dev),
+  table[port]);
+   release_gid_table(table[port]);
+   }

Ira

 
 diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
 index a9d5c70..f5d14a7 100644
 --- a/drivers/infiniband/core/cache.c
 +++ b/drivers/infiniband/core/cache.c
 @@ -582,7 +582,7 @@ static int _gid_table_setup_one(struct ib_device *ib_dev)
   return 0;
  
  rollback_table_setup:
 - for (port = 1; port = ib_dev-phys_port_cnt; port++)
 + for (port = 0; port  ib_dev-phys_port_cnt; port++)
   free_gid_table(ib_dev, port, table[port]);
  
   kfree(table);
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] pull request v2: linux-firmware: Add Intel OPA hfi1 firmware

2015-08-24 Thread ira.weiny
Doug,

Please Pull:

git://github.com/weiny2/linux-firmware.git  master-hfi1-firmware-v2

This is the first release of the Intel OPA hfi1 firmware required by the hfi1
driver.

Changes from V2:
Simplified license text.

Thank you,
Ira Weiny

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] pull request: linux-firmware: Add Intel OPA hfi1 firmware

2015-08-14 Thread ira.weiny
On Tue, Aug 11, 2015 at 05:35:53PM -0600, Jason Gunthorpe wrote:
 On Tue, Aug 11, 2015 at 10:47:03PM +, Vogel, Steve wrote:
  The license terms allow anyone to distribute (but not sell) the firmware but
  only for use on Intel products. 
 
 Redistribution alone may be enough to be included in linux-firmware

I believe redistribution is sufficient as our legal team is well aware that
this was to be released to linux-firmware and the distros.

I submitted with the license which was approved by our legal team after having
been assured it allowed for redistribution by linux-firmware and the distros.

 
 However, most of the additional terms (and there are lots of them)
 this imposes beyond the usual likely make it impossible to include in a
 distro, so pragmatically, there is no reason to push for inclusion in
 linux-firmware.
 
 This is going to be a hard road for you guys. Falling in line with
 every other Intel firmware blob's (i915, ibt, iwlwifi, SST2) license
 would be much easier on you and the distros.

We are working with Legal to come up with a license which is more in line with
what already exists in linux-firmware.

Ira

 
 Frankly, I think the onus is on you to get statements from the
 licensing teams at Fedora, Debian, RH and SuSE on if they can include
 work under this license or not.
 
 I suspect Fedora and Debian will both say they can't, just based on
 their public policies and the additional restrictions in this
 license.. But hey, I'm not a licensing lawyer..
 
 Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] pull request: linux-firmware: Add Intel OPA hfi1 firmware

2015-08-13 Thread ira.weiny
On Tue, Aug 11, 2015 at 05:35:53PM -0600, Jason Gunthorpe wrote:
 On Tue, Aug 11, 2015 at 10:47:03PM +, Vogel, Steve wrote:
  The license terms allow anyone to distribute (but not sell) the firmware but
  only for use on Intel products. 
 
 Redistribution alone may be enough to be included in linux-firmware
 
 However, most of the additional terms (and there are lots of them)
 this imposes beyond the usual likely make it impossible to include in a
 distro, so pragmatically, there is no reason to push for inclusion in
 linux-firmware.
 
 This is going to be a hard road for you guys. Falling in line with
 every other Intel firmware blob's (i915, ibt, iwlwifi, SST2) license
 would be much easier on you and the distros.
 
 Frankly, I think the onus is on you to get statements from the
 licensing teams at Fedora, Debian, RH and SuSE on if they can include
 work under this license or not.
 
 I suspect Fedora and Debian will both say they can't, just based on
 their public policies and the additional restrictions in this
 license.. But hey, I'm not a licensing lawyer..
 

I just noticed that the email from Steve that Jason Replied to did not make it
to the lists.

Here is the text from Steve for reference.

quote
Here is an interpretation of the grant language:

2.11Grant.  Subject to Your compliance with the terms of this Agreement, and
  the limitations set forth in Section 2.2, Intel hereby grants You, during the
term of this Agreement, a non-transferable, non-exclusive, non-sublicenseable
(except as expressly set forth below), limited right and license:
(A)Onunder Intel’s copyrights, to:
(1)Onreproduce and execute the Software only for internal use with Intel
Products, including designing products for Intel Products,; this license does
not include the right to sublicense, and may be exercised only within Your
facilities by Your employees;

[This allows anyone obtaining the software to
make copies and use the software, but not to re-license it.]


(2)Ondistribute the unmodified Software only in Object Code, only for use with
Intel Products; this license includes the right to sublicense, but only the
rights to execute the Software and only under Intel’s End User Software License
Agreement attached as Attachment B, without the right to further sublicense;


[This allows anyone to re-distribute the software for use on Intel products and
requires the them to re-distribute with the license in Attachment B]


/quote

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] IB/sa: Restrict SA Netlink to admin users

2015-08-12 Thread ira.weiny
On Mon, Aug 10, 2015 at 10:46:55PM -0400, ira.we...@intel.com wrote:
 From: Ira Weiny ira.we...@intel.com
 
 The recently added SA Netlink service requires admin privileges to receive
 kernel requests.  This is only partially sufficient to protect the kernel from
 malicious users.  This patch fixes two issues.
 
   1) Path responses from user space could be spoofed if the sequence
  number was properly guessed.
   2) The set timeout request message could be issued by any user.
 
 Ignore these messages if not submitted by an admin user.
 
 Fixes: 6619209af36c (IB/sa: Route SA pathrecord query through netlink)
 Signed-off-by: Ira Weiny ira.we...@intel.com
 
 ---
 Changes from V1:
   Use netlink_net_capable rather than ns_capable

Doug,

As per the thread with the V1 patch we are looking to merge this into a v9 of
Kaikes series once we do some more testing with the netlink_bind and
namespaces.

So you can safely ignore both v1 and this patch.

Thanks,
Ira

 
 Doug let me know if you would prefer that I get Kaike to squash this into the
 original patch.
 
  drivers/infiniband/core/sa_query.c | 16 
  1 file changed, 16 insertions(+)
 
 diff --git a/drivers/infiniband/core/sa_query.c 
 b/drivers/infiniband/core/sa_query.c
 index 70ceec4df02a..6778644a6957 100644
 --- a/drivers/infiniband/core/sa_query.c
 +++ b/drivers/infiniband/core/sa_query.c
 @@ -699,6 +699,12 @@ static int ib_nl_handle_set_timeout(struct sk_buff *skb,
   struct nlattr *tb[LS_NLA_TYPE_MAX + 1];
   int ret;
  
 + if (!netlink_net_capable(skb, CAP_NET_ADMIN)) {
 + pr_warn_ratelimited(SA netlink: invalid perm for set timeout: 
 `%s'.\n,
 + current-comm);
 + return -EPERM;
 + }
 +
   ret = nla_parse(tb, LS_NLA_TYPE_MAX, nlmsg_data(nlh), nlmsg_len(nlh),
   NULL);
   attr = (const struct nlattr *)tb[LS_NLA_TYPE_TIMEOUT];
 @@ -706,6 +712,9 @@ static int ib_nl_handle_set_timeout(struct sk_buff *skb,
   goto settimeout_out;
  
   timeout = *(int *) nla_data(attr);
 +
 + pr_info(SA netlink: timeout: %d\n, timeout);
 +
   if (timeout  IB_SA_LOCAL_SVC_TIMEOUT_MIN)
   timeout = IB_SA_LOCAL_SVC_TIMEOUT_MIN;
   if (timeout  IB_SA_LOCAL_SVC_TIMEOUT_MAX)
 @@ -754,6 +763,12 @@ static int ib_nl_handle_resolve_resp(struct sk_buff *skb,
   int found = 0;
   int ret;
  
 + if (!netlink_net_capable(skb, CAP_NET_ADMIN)) {
 + pr_warn_ratelimited(SA netlink: invalid perm for response: 
 `%s'.\n,
 + current-comm);
 + return -EPERM;
 + }
 +
   spin_lock_irqsave(ib_nl_request_lock, flags);
   list_for_each_entry(query, ib_nl_request_list, list) {
   /*
 @@ -770,6 +785,7 @@ static int ib_nl_handle_resolve_resp(struct sk_buff *skb,
  
   if (!found) {
   spin_unlock_irqrestore(ib_nl_request_lock, flags);
 + pr_err_ratelimited(SA netlink: got unmatched response\n);
   goto resp_out;
   }
  
 -- 
 1.8.2
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] pull request: linux-firmware: Add Intel OPA hfi1 firmware

2015-08-11 Thread ira.weiny
Doug,

Please Pull:

git://github.com/weiny2/linux-firmware.git  master-hfi1-firmware

This is the first release of the Intel OPA hfi1 firmware required by the hfi1
driver.

Thank you,
Ira Weiny

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/sa: Restrict SA Netlink to admin users

2015-08-11 Thread ira.weiny
On Mon, Aug 10, 2015 at 11:38:10PM -0600, Jason Gunthorpe wrote:
 On Mon, Aug 10, 2015 at 05:58:30PM -0400, ira.weiny wrote:
 
  Furthermore, the check in netlink_bind also uses the socket namespace to
  restrict the use of multicast.  This plus my checks should allow an admin to
  place the SA proxy (ibacm in our test cases) in an alternate network 
  namespace
  if they so desire.  But this is independent to the namespace which may be 
  used
  for data applications.
 
 I think Haggai is on to something, there is certainly a problem here,
 that netlink_bind will let a namespace subscribe is a certainly a
 problem for what Haggai is working on.

Ok, After thinking about this more I agree.  Haggai has a point about the arp
tables.  Like I said I'm not a namespace expert.

 
 For now, I think, only root (or CAP_ whatever) in the init namespace
 should have access to this feature. Not sure how to check that.

For these 2 checks it is easy to change to netlink_capable instead of
netlink_net_capable.

 
 Even allowing a namespace to subscribe is problematic because it will
 cause timeouts to hit.. Not sure what to do about that..


Ok, I look into how to deal with the netlink_bind.  I _think_ this may require
the RDMA netlink to provide a custom bind call.  :-(

 
 Also, why the incremental patch? The original isn't ready for mainline
 without the message validation stuff..

Mainly because Kaike was on vacation and I was not sure what Doug would prefer.
Kaike and I have discussed a couple of changes he had queued up so we will need
a v9 so we will merge this into his next v9 submission.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/sa: Restrict SA Netlink to admin users

2015-08-11 Thread ira.weiny
On Tue, Aug 11, 2015 at 05:27:17AM -0600, Wan, Kaike wrote:
  From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
  ow...@vger.kernel.org] On Behalf Of Jason Gunthorpe
  Sent: Tuesday, August 11, 2015 1:38 AM
  To: Weiny, Ira
  Cc: Haggai Eran; dledf...@redhat.com; linux-rdma@vger.kernel.org
  Subject: Re: [PATCH] IB/sa: Restrict SA Netlink to admin users
  
  On Mon, Aug 10, 2015 at 05:58:30PM -0400, ira.weiny wrote:
  
   Furthermore, the check in netlink_bind also uses the socket namespace
   to restrict the use of multicast.  This plus my checks should allow an
   admin to place the SA proxy (ibacm in our test cases) in an alternate
   network namespace if they so desire.  But this is independent to the
   namespace which may be used for data applications.
  
  I think Haggai is on to something, there is certainly a problem here, that
  netlink_bind will let a namespace subscribe is a certainly a problem for 
  what
  Haggai is working on.
  
  For now, I think, only root (or CAP_ whatever) in the init namespace should
  have access to this feature. Not sure how to check that.
 
 netlink_capable(skb, CAP_NET_ADMIN) will do the trick.
 
For these calls yes.  For the bind call I think we need to investigate more.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/sa: Restrict SA Netlink to admin users

2015-08-10 Thread ira.weiny
On Mon, Aug 10, 2015 at 09:11:08AM +0300, Haggai Eran wrote:
 On 07/08/2015 00:08, ira.we...@intel.com wrote:
  @@ -754,6 +764,12 @@ static int ib_nl_handle_resolve_resp(struct sk_buff 
  *skb,
  int found = 0;
  int ret;
   
  +   if (!ns_capable(sock_net(skb-sk)-user_ns, CAP_NET_ADMIN)) {
  +   pr_warn_ratelimited(SA netlink: invalid perm for response: 
  `%s'.\n,
  +   current-comm);
  +   return -EPERM;
  +   }
  +
  spin_lock_irqsave(ib_nl_request_lock, flags);
  list_for_each_entry(query, ib_nl_request_list, list) {
  /*
 
 Maybe I'm missing something, but I thought you would want to check the
 capability with init_user_ns as the target, since the SA queries will
 affect all namespaces, not just the one that sent the response.

I'm not an expert in namespaces but these checks are only concerned with the
network namespace of the process which is acting as an SA proxy (via netlink).

AFAIK a user can't elevate a network namespace that the SA proxy is running in.
While you are correct that this data does affect all namespaces who are using a
particular port, this does not matter.  The SA data for that port is applicable
no matter which network namespace an application running on that port is in.

Furthermore, the check in netlink_bind also uses the socket namespace to
restrict the use of multicast.  This plus my checks should allow an admin to
place the SA proxy (ibacm in our test cases) in an alternate network namespace
if they so desire.  But this is independent to the namespace which may be used
for data applications.


All that said, I did find the netlink_net_capable helper call which I should
have used instead.  I'll work up a v2 today.

Thanks,
Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 2/4] IB/core: Add rdma netlink helper functions

2015-08-04 Thread ira.weiny
On Mon, Aug 03, 2015 at 09:15:34PM -0600, Jason Gunthorpe wrote:
 On Thu, Jul 09, 2015 at 01:34:26PM -0400, kaike@intel.com wrote:
  From: Kaike Wan kaike@intel.com
  
  This patch adds a function to check if listeners for a netlink multicast
  group are present. It also adds a function to receive netlink response
  messages.
  
  Signed-off-by: Kaike Wan kaike@intel.com
  Signed-off-by: John Fleck john.fl...@intel.com
  Signed-off-by: Ira Weiny ira.we...@intel.com
   drivers/infiniband/core/netlink.c |   55 
  +
   include/rdma/rdma_netlink.h   |7 +
   2 files changed, 62 insertions(+), 0 deletions(-)
  
  diff --git a/drivers/infiniband/core/netlink.c 
  b/drivers/infiniband/core/netlink.c
  index 23dd5a5..d47df93 100644
  +++ b/drivers/infiniband/core/netlink.c
  @@ -49,6 +49,14 @@ static DEFINE_MUTEX(ibnl_mutex);
   static struct sock *nls;
   static LIST_HEAD(client_list);
   
  +int ibnl_chk_listeners(unsigned int group)
  +{
  +   if (netlink_has_listeners(nls, group) == 0)
  +   return -1;
  +   return 0;
  +}
  +EXPORT_SYMBOL(ibnl_chk_listeners);
 
 I was thinking about this today, and, where is the security?
 
 What prevents a non-root user from making the above true and/or worse?

We are using Netlink multicast.  I believe that netlink_bind only allows root
to bind to multicast.

static int netlink_bind(struct socket *sock, struct sockaddr *addr,
int addr_len)
{

...
/* Only superuser is allowed to listen multicasts */
if (groups) {
if (!netlink_allowed(sock, NL_CFG_F_NONROOT_RECV))
return -EPERM;
err = netlink_realloc_groups(sk);
if (err)
return err;
}
...


That said I have not tested the ability to change the timeout settings if one
were to bind without multicast and send a message.

I'll see if I can get some time to test this as Kaike is out on vacation.

Ira

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >