[Felix Marti] In addition, is arming the CQ really in the performance
path? - Don't apps poll the CQ as long as there are pending CQEs and
only arm the CQ for notification once there is nothing left to do? If
this is the case, it would mean that we waste a few cycles 'idle'
cycles.
On Thu, 2007-01-04 at 13:34 -0800, Roland Dreier wrote:
OK, I'm back from vacation today.
Anyway I don't have a definitive statement on this right now. I guess
I agree that I don't like having an extra parameter to a function that
should be pretty fast (although req notify isn't quite as
] [PATCH v4 01/13] Linux RDMA Core Changes
On Thu, 2007-01-04 at 13:34 -0800, Roland Dreier wrote:
OK, I'm back from vacation today.
Anyway I don't have a definitive statement on this right now. I
guess
I agree that I don't like having an extra parameter to a function
that
should
On Thu, 2007-01-04 at 07:07 +0200, Michael S. Tsirkin wrote:
If you think I should not add the udata parameter to the req_notify_cq()
provider verb, then I can rework the chelsio driver:
1) at cq creation time, pass the virtual address of the u32 used by the
library to track the current
OK, I'm back from vacation today.
Anyway I don't have a definitive statement on this right now. I guess
I agree that I don't like having an extra parameter to a function that
should be pretty fast (although req notify isn't quite as hot as
something like posting a send request or polling a cq),
On Thu, 2007-01-04 at 13:34 -0800, Roland Dreier wrote:
OK, I'm back from vacation today.
Anyway I don't have a definitive statement on this right now. I guess
I agree that I don't like having an extra parameter to a function that
should be pretty fast (although req notify isn't quite as
@@ -1373,7 +1374,7 @@ int ib_peek_cq(struct ib_cq *cq, int wc_
static inline int ib_req_notify_cq(struct ib_cq *cq,
enum ib_cq_notify cq_notify)
{
- return cq-device-req_notify_cq(cq, cq_notify);
+ return cq-device-req_notify_cq(cq, cq_notify,
@@ -1373,7 +1374,7 @@ int ib_peek_cq(struct ib_cq *cq, int wc_
static inline int ib_req_notify_cq(struct ib_cq *cq,
enum ib_cq_notify cq_notify)
{
- return cq-device-req_notify_cq(cq, cq_notify);
+ return cq-device-req_notify_cq(cq, cq_notify,
It seems all Chelsio needs is to pass in a consumer index - so, how about
a new
entry point? Something like void set_cq_udata(struct ib_cq *cq, struct
ib_udata *udata)?
Adding a new entry point would hurt chelsio's user mode performance if
if then requires 2 kernel
No, it won't need 2 transitions - just an extra function call,
so it won't hurt performance - it would improve performance.
ib_uverbs_req_notify_cq would call
ib_uverbs_req_notify_cq()
{
ib_set_cq_udata(cq, udata)
I've run this code with mthca and didn't notice any performance
degradation, but I wasn't specifically measuring cq_poll overhead in a
tight loop...
We were speaking about ib_req_notify_cq here, actually, not cq poll.
So what was tested?
--
MST
-
To unsubscribe from this list: send the line
On Wed, 2007-01-03 at 17:02 +0200, Michael S. Tsirkin wrote:
I've run this code with mthca and didn't notice any performance
degradation, but I wasn't specifically measuring cq_poll overhead in a
tight loop...
We were speaking about ib_req_notify_cq here, actually, not cq poll.
So what
On Wed, 2007-01-03 at 17:00 +0200, Michael S. Tsirkin wrote:
No, it won't need 2 transitions - just an extra function call,
so it won't hurt performance - it would improve performance.
ib_uverbs_req_notify_cq would call
ib_uverbs_req_notify_cq()
{
I've run this code with mthca and didn't notice any performance
degradation, but I wasn't specifically measuring cq_poll overhead in a
tight loop...
We were speaking about ib_req_notify_cq here, actually, not cq poll.
So what was tested?
Sorry, I meant req_notify. I didn't
No, it won't need 2 transitions - just an extra function call,
so it won't hurt performance - it would improve performance.
ib_uverbs_req_notify_cq would call
ib_uverbs_req_notify_cq()
{
ib_set_cq_udata(cq, udata)
On Wed, 2007-01-03 at 21:33 +0200, Michael S. Tsirkin wrote:
Without extra param (1000 iterations in cycles):
ave 101.283 min 91 max 247
With extra param (1000 iterations in cycles):
ave 103.311 min 91 max 221
A 2% hit then. Not huge, but 0 either.
Convert cycles to ns
So what does this tell you?
To me it looks like there's a measurable speed difference,
and so we should find a way (e.g. what I proposed) to enable chelsio
userspace
without adding overhead to other low level drivers or indeed chelsio kernel
level code.
What do you think?
diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c
b/drivers/infiniband/hw/mthca/mthca_cq.c
index 283d50b..15cbd49 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -722,7 +722,8 @@ repoll:
return err == 0 || err == -EAGAIN ? npolled
Support provider-specific data in ib_uverbs_cmd_req_notify_cq().
The Chelsio iwarp provider library needs to pass information to the
kernel verb for re-arming the CQ.
Signed-off-by: Steve Wise [EMAIL PROTECTED]
---
drivers/infiniband/core/uverbs_cmd.c |9 +++--
19 matches
Mail list logo