On Fri, 11 Jun 2010, Steve Wise wrote:
Jason Gunthorpe wrote:
On Fri, Jun 11, 2010 at 08:18:43PM -0500, Steve Wise wrote:
Further, its not even needed for IB, just iWARP. That's an unnecessary
admin pain IMO.
IB already completely seperates from the host stack, that is why it
isn't effected by this problem. It has both a seperate port numbering
space (in rdma cm) and a separate addressing space (GID).
By virtue of being a different L4 transport. iWARP uses TCP as the L4
and thus has these issues.
Also, from an application perspective, IB has IP addresses that are
shared with the TCP stack and the RDMA stack. So it still appears as
integrated. At least with librdmacm applications...
The entire problem with iWARP is that it is trying to not be
seperate, unlike IB. So.. simple answer: use a seperate IP, or use a
seperate port space (ie don't use the TCP protocol number).
That's right. iWARP isn't trying to be seperate. It's trying to converge
traffic on a single link. People are doing this all over the place with
Data Center Bridging (IEEE 802.1Qaz). It results in easier data center
deployments, less switches, less cables, less cost, etc., etc.
The fundamental problem here is certain people in the networking community
don't believe in RDMA or iWARP. This leads to exactly what's going on
here, which is the lack of discussion to come up with a better approach if
people are opposed to this one. The iWARP community is highlighting a
limitation with how the protocol is designed, they're trying to fix it,
and the response is "don't do that." We need technical guidance if people
are opposed.
Other protocols are also running over networking today, such as iSCSI and
FCoE. These happily co-exist with other L2->L4 protocols in the stack.
This iWARP patch allows iWARP to happily co-exist on a TCP connection, and
does *not* negatively affect the networking stack at all.
Change the iWARP specification? I don't think that's a simple answer. :)
ROCEE won't have this problem either..
Obviously...it's running over Ethernet. This doesn't help the discussion
at hand one bit. It's comparing apples to oranges.
iWARP should and can easily co-exist with the host TCP by sharing
the port space. But, as Roland stated already, maybe the only way
forward it to get end-user pressure applied at the appropriate
places! :)
*shrug* This isn't going to happen until netdev decides to design-in
statefull offload. I doubt that is going to happen any time
soon. I've already seen Linux max out 40GE on benchmarks, so it is
hard to see what the driver would be.
I beg to differ. I work in the Ethernet division at Intel, and have been
working with 10GbE devices for the past few years. I can easily scale
Nehalem-EX systems beyond 100 GbE with hand-tuning of NUMA alignment,
interrupt affinity, and flow/application affinity. Linux is more than
capable of scaling. The question isn't about netdev adding stateful
offloads; this patch isn't doing anything like that one bit. The iWARP
hardware knows which ports are in use for iWARP connections, and delivers
those packets to the QP's that need it. Those then interact directly with
the RDMA stack. No hooks in netdev needed.
The base argument is Dave Miller doesn't like RDMA or iWARP. I completely
respect Dave's opinions, but in this case, I don't think it's his call,
since this patch is completely isolated to the IB tree and the RDMA stack.
Not including this will directly affect anyone using iWARP with Linux.
Hopefully those people will chime in at some point, but I'm not sure that
will help because the arguments so far against this have been somewhat
irrational, and nothing technical-oriented.
Cheers,
-PJ Waskiewicz
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html