Hal 1. It appears I need the following modules in this order:
Hal core/ib_client_query.ko core/ib_sa_client.ko
Hal ulp/ipoib/ib_ipoib.ko ulp/ipoib/ib_ip2pr.ko
Don't worry about which modules or what order. Just do modprobe
ib_ipoib and modprobe ib_mthca (in either order) and it
Hal tvflash -v -d fw-23108-rel-3_2_0/fw-23108-a1-rel.mlx GUID
tvflash uses binary firmware images, not mlx format. See the README.
- R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To
I'm reading over the SMI implementation, and I have a few questions.
First of all, I don't see how LID-routed SMPs or PMA MADs are handled
correctly. As far as I can tell, what ends up happening for a LR-SMP
or PMA query is:
smi_recv_handler() gets MAD and just calls
smi_recv_smp(), which
Hal No, it's not broken (of course depending on one's definition)
Hal and yes, this can be bypassed and is on my TODO list. I have
Hal been focusing more on functionality right now.
Oh, OK, I see. smi_send_smp() is what actually generates the response
and sends it back. Sorry I
Hal Sounds like the driver is the only place where this can be done then.
Do you mean that the mthca driver should snoop on receive completions
on QP0 and look for traps with SLID==0? This seems pretty awkward
(since the MAD layer is going to lose those receive completions, how
do we handle
Hal It seems like the thread ended with an unanswered
Hal question. The answer appears to be that process_mad was
Hal used. Is that what we want to do for OpenIB ?
That seems simplest to me but I'm not opposed to adding another driver
entry point if that makes things simpler.
Hal
Hal Just wanted to double check on the MAD layer needs for
Hal porting the current SA client code (for Get PathRecord and
Hal Set/Delete/Get MCMemberRecord for IPoIB). Is it correct that
Hal this code does not rely on request/response matching (and
Hal timeouts) currently ? If
Hal I have the changes for this but it requires some
Hal conditionalization to the core/Makefile which you previously
Hal objected to. Should I generate a patch for this ?
Switching the Makefile from what it is now to something that uses
kbuild has to be an improvement. I'm not sure
David Ok. How does the port inform the SM that it has a
David preferred LID?
The port will already have a LID assigned when the SM discovers it.
My understanding is that the SM is encouraged to preserve a port's
LID if it doesn't conflict with any other LIDs, and this is what we're
Sean Shouldn't we be able to keep the ib_req_notify_cq at the end
Sean of this function? If additional completions are left after
Sean polling, a second event should be generated. Or at least
Sean that's what I remember from out discussions about this...
On Mellanox HCAs but not
Sean What is the client doing with the reference counting? When
Sean their send handler gets called, they free their send
Sean context. After they deregister, they free their mad_agent
Sean context.
OK, here's a realistic example. In IPoIB, I'll probably only create
one
Yaron A better solution that IBTA needs to look at is creating a
Yaron well known Loopback LID value that apps use when they want
Yaron to talk locally (like IP 127...)
If/when IBTA specifies this and available hardware implements this,
then this will be a great solution.
In the
Michael See section 17.3.1 version 1.1 page 919 of volume 1 which
Michael provides specific guidance on how loopback is implemented
Michael and what LID should be used. There is no reason for a
Michael loopback LID. This topic was debated and the spec
Michael reflects the
Fab I think as long as ib_cancel_mad can return before the
Fab corresponding send completes (i.e. return -EBUSY), you have
Fab this problem and client must provide their own
Fab synchronization, whether through reference counting or some
Fab other means.
Fab Returning
Sean Note that returning -EBUSY from ib_cancel_mad would only
Sean indicate that a callback *might* be invoked. It could have
Sean already been called, which leads me to think that no return
Sean value would be better than one that a client tries to use.
Agree... at least with
Hal Oops. Looks like we need a way to expose the PD and MR to be
Hal able to do this. How about adding this into the mad_agent
Hal structure returned ? If that makes sense, I will generate the
Hal patch for this.
Yep, looks that way. I didn't notice at the time, but reusing the
Hal Yup. He's already got a way to get the PD. Rather than the
Hal MR, I think just adding the lkey to the mad_agent structure
Hal will suffice. Do you agree ?
I guess so... it seems a little odd to make the consumer copy the
L_Key from one place to another without knowing anything
David I'm describing what is in the current IBA. The IBA
David describes the conditions where a P_Key value should be set
David into the P_Key table. There is no similar description for
David LIDs in the IBA.
Right, as I said before, that's what I thought (but I wasn't sure I
David I think it would be a mistake to use skb-dst as a flag for
David unicast or not. Even if it is correct in all cases you care
David about now (I don't know either way), it would be a hidden
David dependency with high potential to break something
David eventually.
That's
Sean Having an allocator routine might force users to perform
Sean data copies when sending data.
Sean Do all of the existing MAD implementations have routines to
Sean allocate MADs when sending data, and require those routines
Sean to be used?
Not Topspin's.
I think moving
Michael Actually saw some mail on lkml the other day about a 64
Michael bit system where memory is fragmented - PCI at 0 and
Michael actual memory near -1. Anyway, Tavor is likely not only
Michael device with restrictions on maximum region size?
Right, that's why there's
+ bus_to_virt(cur_send_wr-sg_list-addr))-tid.id;
Didn't notice this before but any use of bus_to_virt() is broken. We
need to figure out a different way to do whatever you're trying to do here.
- R.
___
openib-general
Hal Can you explain why using bus_to_virt() is broken ?
See Documentation/DMA-mapping.txt:
It is planned to completely remove virt_to_bus() and bus_to_virt() as
they are entirely deprecated. Some ports already do not provide these
as it is impossible to correctly support them.
Hal Good point. We will need more than access to the TID for
Hal RMPP. We need a replacement for bus_to_virt. Is there an
Hal approved way to get from DMA address to VA ?
No, you just need to save off the VA if you need to use it later. So
maybe we need to add a pointer to the MAD
Sean Would we need multiple VAs if scatter-gather is used by the client?
Yep. Or we could just say that all the fields the access layer needs
to look at must be in the first s/g entry.
- R.
___
openib-general mailing list
[EMAIL PROTECTED]
thanks, applied.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Great! I will start merging this onto my tree as soon as I get my
IPoIB driver into shape to commit. (I've currently torn it apart
getting rid of the pseudo-ethernet layer, and I have it mostly working
again).
- Roland
___
openib-general mailing list
Hal Hi, Has mthca been tested with Arbel (PCI Express) ? Is this
Hal in compatibility mode or native mode or both ?
I've tested mthca with Arbel in compatibility mode. I haven't had
access to native mode firmware or up-to-date documentation, so I have
not even started on native mode
Robert Does the PCI-Express HCA require any software changes to
Robert the HCA driver ? i.e., will it run with an existing PCI-X
Robert tavor driver ? I have heard that it does not require any
Robert changes, but I just wanted to confim that.
As Hal mentioned, there are two
I am using the system level queue. If we think that using our own MAD queue
is better, I will do that. I was thinking more along the lines of a single
workqueue for all MAD services, with one per processor, rather than a
workqueue per port, however.
I don't think the system keventd queue is
Any idea on where the cutoff for huge is or is this likely to be a
matter of experience ? Is it at least 2 ? What about 4 ?
Depends on the number of CPUs and the workload. The single workqueue
design starts being inefficient when you start getting idle time because every
workqueue thread is
The increase in cost for the spec is rather unfortunate but I think
it's orthogonal to any IP issues. Since the Linux kernel contains a
lot of code written to specs available only under NDA (and even
reverse-engineered code where specs are completely unavailable), I
don't think the expense should
Roland it's orthogonal to any IP issues. Since the Linux kernel
Roland contains a lot of code written to specs available only
Roland under NDA (and even reverse-engineered code where specs
Roland are completely unavailable), I don't think the expense
Roland should be an issue.
Now that Linus has officially released 2.6.9, I am removing backwards
compatibility from mthca (basically the patch below). I added a 2.6.9
patch in the src/linux-kernel/patches directory. Since the tree does
not compile against anything older, I removed the older kernel
patches (I don't think
This series of patches adds the
struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags);
API to my branch. To recap, this creates an MR that can access any
DMA address for an HCA device (all 64 bits of memory in the Tavor case)
(Tom, if you want to give this a spin on
Index: infiniband/include/ib_verbs.h
===
--- infiniband/include/ib_verbs.h (revision 1024)
+++ infiniband/include/ib_verbs.h (working copy)
@@ -736,6 +736,8 @@
enum
Index: infiniband/core/mad_ib.c
===
--- infiniband/core/mad_ib.c(revision 915)
+++ infiniband/core/mad_ib.c(working copy)
@@ -59,7 +59,7 @@
mad, IB_MAD_PACKET_SIZE,
Index: infiniband/ulp/ipoib/ipoib_verbs.c
===
--- infiniband/ulp/ipoib/ipoib_verbs.c (revision 952)
+++ infiniband/ulp/ipoib/ipoib_verbs.c (working copy)
@@ -201,24 +201,10 @@
if (ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP))
A little while ago, we had a brief discussion about what MR consumers
should use for MADs they want to send. It seems the two possibilities
where for the MAD layer to expose its MR for consumer use, or for
consumers to create a new MR using the MAD layer's PD. Which option
did we decide was the
Andras VAPI and/or IPoIB.
Unfortunately there are no user space verbs right now (work should be
starting soon). When we do implement the verbs the API will most
likely be closer to the current kernel API than to VAPI.
Andras In short, the kernel crashed after trying to send the
This patch improves how IPoIB handles multicasts. It should fix the
crash that Andras saw; unfortunately I don't think it will help with
Tom's crash (although I don't understand that crash so it might fix
it). Unfortunately it still probably doesn't work with some SMs.
Also, with this patch,
Hal What new MAD module work are you referring to here ?
The stuff that you and Sean are working on. I'm waiting for Sean's
timeout code and replacement of the MAD thread with a workqueue to be
finished and merged before I pull the code into my tree (to avoid
merging hassles on my side).
-
Sean I think I'm missing something here. I thought that the
Sean snoop_mad entry point was the solution to this issue.
Except as currently defined, it doesn't provide a way for the
low-level driver to give back a response -- it just lets the low-level
driver steal MADs like locally
Michael I think the difficulty with the last one is that (at
Michael least for Tavor) process local mad can block, since there
Michael is a limited number of outstanding commands. Of course
Michael you could always make it non-blocking by dropping the MAD
Michael if the command
Hal What are the other special cases for registration ?
Not just registration... I just meant that having an extra snoop_mad
entry point and a special issm bit and hard-coding different treatment
of SMInfo in the MAD layer starts to smell to me like the MAD layer is
at the wrong level of
Hal OK. It's pretty straightforward to change the MAD layer to
Hal use PLM rather than snoop MAD (and remove snoop_mad (undo
Hal that patch)). Should I post the changes ?
It's my idea so I certainly like the approach :)
Sean, what do you think?
- R.
By the way, in case someone else wants to use the same approach,
here's how I make sure my changes build across multiple archs:
I'm using toolchains built with http://www.kegel.com/crosstool/ and
the attached script to make sure my tree builds on i386, x86_64,
ppc64, ia64, ppc, sparc64 and
Sean I think that it makes sense, but just to make sure that I'm
Sean clear on this. We want to pass every received MAD to the
Sean HCA driver before any processing has occurred on the MAD,
Sean correct?
That's my plan...
Sean If the MAD is not consumed by the driver, the
Grant Up to you (or whoever maintains the code). Some drivers
Grant that have their own subdir keep the prefixes. e1000 and
Grant sym2 drivers are the counter examples I had in mind.
Good point... well, I'm getting sick of typing ib_ :)
Also I'd argue that the e1000_ or sym_
Hal If I understand correctly, this obviates the need for what is
Hal now ib_agent. All that might remain is SMI handling for DR
Hal SMPs. Is that right ?
I think the receive path looks something like
if (DR SMP)
SMI checks (discard on failure)
rc =
OK, I'm going to go ahead and rename ib_mad.c - mad.c, ib_agent.c -
agent.c etc. (This also makes it possible to build a module named
ib_mad.o, which I think makes more sense than ib_al.o, from multiple
sources).
I can continue to merge by hand but it might make sense to make the
same change on
+#include linux/random.h
+#include linux/spinlock.h
+#include linux/slab.h
+#include linux/pci.h
+#include linux/kref.h
+#include linux/idr.h
+
+#include ib_pack.h
+#include ib_mad.h
+#include ib_sa.h
+
+MODULE_AUTHOR(Roland Dreier);
+MODULE_DESCRIPTION(InfiniBand subnet administration query support
This converts IPoIB to use the new SA API for PathRecord and
MCMemberRecord transactions.
Correcting the component mask used for multicast joins after the initial
broadcast group still needs to be done...
- R.
Index: ulp/ipoib/ipoib_main.c
Sean I didn't realize that you had taken a copy of the current
Sean mad code. Is there anything in the openib-candidate branch
Sean that isn't in your branch? Does it make sense to just
Sean update the code in the roland-merge branch?
I've got everything up to r1080 in my branch
Hal Sure. Are there specific ones you have in mind ? The ones
Hal that are KERN_DEBUG ? Any others ?
This alone will probably clean up my dmesg a lot:
Index: infiniband/core/mad.c
===
--- infiniband/core/mad.c
Hal I wrote a little too soon: On shutdown of the machine with
Hal outstanding joins, the message: ib0: waiting on -5 multicast
Hal groups is repeated and shutdown appears to hang.
OK, looks like a reference counting bug.
- R.
___
OK, I just fixed the bug below (left over in the port to new SA API).
should work better now I hope.
- R.
Index: infiniband/ulp/ipoib/ipoib_multicast.c
===
--- infiniband/ulp/ipoib/ipoib_multicast.c (revision 1102)
+++
Shirley It's better to use semaphore instead of atomic_read to
Shirley check the reference count 0 in wait_event() in
Shirley ib_unregister_mad_agent(). Agree?
I don't see how one uses a semaphore to wait for a reference count to
become zero (semaphores sleep until their count is
The inline patch was whitespace damaged and line wrapped. Is there
any way to make your attachments have mime type text/x-patch? That
way my client will display it inline.
- R.
___
openib-general mailing list
[EMAIL PROTECTED]
Hal So should this patch be applied or is it superceeded by your
Hal pending patch (and I should wait for that) ?
sounds like the patch is not needed and actively breaks things, so my
guess would be that it's better not to apply.
- R.
___
Ken Also, another question I have is fairly naive -- at what
Ken point are the Lion Cub (PCI Express) cards supported in the
Ken OpenIB stack? I seem to remember the Tavor code supporting
Ken them inherently but in a non-efficient manner if native code
Ken wasn't used.
Lion
von Hi, I get this problem after installing the gen2 roland-merge
von stack (for linux kernel) and the gen1 trunk (for useraccess)
gen1 userspace won't work with gen2 kernel side, unfortunately.
- Roland
___
openib-general mailing list
[EMAIL
Hal When I did a modprobe -r ib_ipoib, I got the following oops
Hal when the SA's send_handler is called on it's deregistering
Hal it's MAD client with pending MADs.
Can you reproduce it with a kernel with CONFIG_KALLSYMS turned on so
that I can read the oops?
Thanks,
Roland
As far as I can tell this patch is broken: it removes the qp_cap
parameter to modify_qp but doesn't fix up the mthca functions. I
added the missing pieces by hand and applied.
- R.
___
openib-general mailing list
[EMAIL PROTECTED]
Grant Roland, I am trying to build roland-merge #1119 on top of
Grant 2.6.10-rc1 for ia64. And yes, the usage noted below
Grant doesn't match the declaration:
Grant Should I not (yet) be enabling CONFIG_INFINIBAND_CM?
Grant Trivial patch appended to fix. Though I don't know
Hal Hi Roland, When shutting down mthca after shutting down
Hal IPoIB, the following message appears on the console:
Hal ib_mthca :03:00.0: dma_pool_destroy mthca_av, c03a6000 busy
Yes, this is because IPoIB currently leaks AVs (we need to hook into
the neighbour destructor to
Sean Is anyone willing to work on porting opensm to this? If
Sean not, I can start on this. Otherwise, I will continue
Sean working on adding MAD error/overrun handling.
It would be great to work on that but we need to resolve how to handle
the SM classes first.
One option would be
Sean I thought that we had decided to go this route, and replace
Sean snoop_mad with calls to process_mad. If we're in agreement
Sean on this, I can do it first.
That was my impression too, so I think that would be a good route to go.
- R.
Grant 1) drivers/infiniband/include/ still has alot of files
Grant still prefixed with ts. Do they all need to be renamed?
Grant Or do some need to be reworked to match some new
Grant interfaces?
I think pretty much every ts_*.h file is obsolete. When we port the
CM to the new
Not sure what the goal is here, but I should point out that current
mthca code does not implement resizing either CQs or QPs.
However I'm not sure I understand why the MAD layer wants to resize
these objects -- given that the number of QPs is known in advance and
that the MAD layer can choose how
Johannes Does the device name need to have the HCA driver name in
Johannes it? Also, the u in umad is implied.
Good point, I'll change the docs to suggest no u.
Johannes Wouldn't it be more appropriate to do something like
Johannes this:
Johannes /dev/infiniband/hca0/mad1
Can you resend either with a different mailer or as an attachment?
The patch was pretty line-wrapped.
- R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit
Michael If the max. number of QPs is very big, you may want the
Michael actual CQ size to grow gradually with demand.
sure but there are only 2 soecial qps per port.
___
openib-general mailing list
[EMAIL PROTECTED]
Ronald neat. How does this differ from tvflash that Roland wrote?
correction: I just cleaned up the code. Kamen and Johannes here at
Topspin did most of the real work in writing tvflash...
- R
___
openib-general mailing list
[EMAIL PROTECTED]
OK, I merged the MAD code in my branch up to r1135 and applied this
patch (there was one missing chunk in ib_verbs.h to remove the
snoop_mad method from struct ib_device, which I added by hand).
Thanks,
Roland
___
openib-general mailing list
[EMAIL
Sean Is there any interest among people to reuse receive MADs?
Sean I.e. once allocated and mapped, the receive MAD and work
Sean request would be re-posted to the QP when freed.
I'm not sure this is that useful... MAD processing is not such a
super-hot path that we need to keep
Hal Is this a driver or firmware issue ?
Driver issue. I just haven't implemented CQ resize yet, and it's not
a high priority for me.
Hal Might this be useful for redirected QPs ?
I don't think so, since the redirected QP will not be attached to the
MAD layer's CQ.
Hal Should the
Hal 1. Are there changes planned for core/cache.c ?
I've cleaned it up a little but I'm really not sure exactly what
should be done with it.
Hal 2. Shouldn't src/userspace/tools/libsdp be removed for now ?
Yeah, I'll do that.
-R.
___
Hal Does the driver do this (QP is sized larger than what was
Hal requested) now ? Or is this a spec thing ?
Unless my memory is playing tricks on me, I don't think mthca will
create a QP larger than requested.
- R.
___
openib-general mailing
Sean Hal, can you check that your code stays within 80 characters
Sean per line?
The 80 character limit is really just a guideline. It's not worth
going through contortions to fix an 85-character line.
- Roland
___
openib-general mailing list
Sean Okay. I was just going by the coding style documentation
Sean that mentioned that this was a hard limit. If it's not
Sean that big of a deal, then I'll only worry about excessively
Sean long lines.
Yeah, if you read through the kernel source, you can find tons and
tons of
Actually looking at this code one more time:
spin_lock_irqsave(idr_lock, flags);
if (idr_find(query_idr, query-id) != query) {
spin_unlock_irqrestore(idr_lock, flags);
return;
}
spin_unlock_irqrestore(idr_lock, flags);
I think this should be better:
Index: core/sa_query.c
===
--- core/sa_query.c (revision 1175)
+++ core/sa_query.c (working copy)
@@ -544,12 +544,13 @@
ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table),
OK, I committed with error messags like Couldn't get ib_mad DMA MR
- R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Use RCU instead of seqlocks, and simplify the code.
Index: core/device.c
===
--- core/device.c (revision 1178)
+++ core/device.c (working copy)
@@ -190,8 +190,7 @@
int ib_register_device(struct ib_device *device)
{
-
By the way, we probably want this applied:
Index: core/mad.c
===
--- core/mad.c (revision 1184)
+++ core/mad.c (working copy)
@@ -385,7 +385,7 @@
mad_agent-device-node_type,
Hal Doesn't that just map starting at the GRH ? This is to handle
Hal PMA responses which might have GRHs.
Sure, it maps starting at the GRH and uses that as the start of the
gather segment used for the send (and tries to send more than 256
bytes). This is wrong even when sending a
OK, this works on my i386 system but I'm still getting
ib_mad: Invalid directed route
on ppc64. I'll try to debug what exactly is happening (ie put some
prints in to see why smi_handle_dr_smp_send() is rejecting it).
- R.
___
openib-general
Roland OK, this works on my i386 system but I'm still getting
Roland ib_mad: Invalid directed route
Roland on ppc64. I'll try to debug what exactly is happening (ie
Roland put some prints in to see why smi_handle_dr_smp_send() is
Roland rejecting it).
By the way, the i386
Nitin certainly it does break my x86_64 setup too. Can we revert
Nitin back to working set of bits please ?
It's actually not an architecture issue -- it's an issue if your node
is more than one hop from the SM. You should be able to use the patch
I just posted to get things working
Roland OK, I think I understand the problem, but I'm not sure
Roland what the correct solution is. When a DR SMP arrives at a
Roland CA from the SM, hop_cnt == hop_ptr == number of hops in
Roland the directed route,
Hal What was the number ?
For one port it was 4 and for
Hal == Hal Rosenstock [EMAIL PROTECTED] writes:
Hal I can see now that this is wrong and have a fix for what
Hal stops IPoIB from working. The problem was that the response
Hal was received by the MAD layer but not dispatched due to the
Hal change(s) noted above.
Hal So I am
Sean What exactly does it mean then when process_mad returns
Sean success? Do any of the return bits from process_mad
Sean indicate that the MAD was for the HCA driver?
SUCCESS means that process_mad didn't encounter any errors. If REPLY
or CONSUMED is set then process_mad actually
By the way, if I am reading the code correctly, it looks like the MAD
layer only checks for IB_MAD_RESULT_REPLY and not
IB_MAD_RESULT_CONSUMED. If IB_MAD_RESULT_CONSUMED is set then the
packet is something like a trap repress handled by the SMA or a
locally generated trap that the driver
Roland I think keeping the MAD code simpler is probably best right now.
Hal Hope that is for technical reasons and not for the recent missteps.
Yes, it's just that the MAD code is quite complicated already with
multiple tests for DR SMPs etc; mad.c alone is over 2000 lines now. I
don't
Roland I guess the problem with calling smi_handle_dr_smp_recv()
Roland twice on the same packet is that the function may alter
Roland the packet.
Hal No, the second call to smi_handle_dr_smp_recv() was on the
Hal outgoing response and not the incoming request. The thought
Matt As some of you may have noticed, we migrated over to the
Matt new OpenIB web pages yesterday. The FAQ and a few other
Matt items are still a work in progress. Let me know if there
Matt are any errors or if folks have other feedback/suggestions.
Looks great. One
In the upstream kernel, the use of SPIN_LOCK_UNLOCKED is being
phased out (look for changesets like Lock initializer unifying).
This patch converts the MAD layer to use spin_lock_init() instead,
please apply.
- R.
Index: core/agent.c
Thanks, applied.
- R.
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Hal Unfortunately I still see:
Hal ib0: ib_dealloc_pd failed
Hal when I removed ib_ipoib
I understand why that happens: I try to free the PD before waiting for
all the AHs to be reaped. This should be fixed soon.
- R.
___
openib-general
1 - 100 of 3803 matches
Mail list logo