On 09/27/2015 10:28 PM, Christoph Lameter wrote: > On Sun, 27 Sep 2015, Doug Ledford wrote: > >> Currently I'm testing your patch with a couple other patches. I dropped >> the patch of mine that added a module option, and added two different >> patches. However, I'm still waffling on this patch somewhat. In the >> discussions that Jason and I had, I pretty much decided that I would >> like to see all send-only multicast sends be sent immediately with no >> backlog queue. That means that if we had to start a send-only join, or >> if we started one and it hasn't completed yet, we would send the packet >> immediately via the broadcast group versus queueing. Doing so might >> trip this new code up. > > If we send immediately then we would need to check on each packet if the > multicast creation has been completed?
We do that already anyway. Calling find_mcast and then checking
if(!mcast || !mcast-ah) is exactly that check.
> Also broadcast could cause a unecessary reception event on the NICs of
> machines that have no interest in this traffic.
This is true. However, I'm trying to balance between several competing
issues. You also stated the revamped multicast code was adding latency
and dropped packets into the problem space. Sending over the broadcast
would help with latency. However, I have an alternative idea for that...
> We would like to keep
> irrelevant traffic off the fabric as much as possible. An a reception
> event that requires traffic to be thrown out will cause jitter in the
> processing of inbound traffic that we also would like to avoid.
That may not be optimal for your app, but we also need to try and
maintain proper emulation of typical IP/Ethernet behavior since this is
IPoIB after all. That's why the app isn't required to join the group
before sending, and also why it should be able to expect that we will
fall back to sending via broadcast if needed.
However, the following algorithm might be suitable here:
On first packet:
create mcast group
queue packet to group
schedule join
On subsequent packets:
find mcast group
check mcast state
if already joined, send immediately
if joining, queue packet to mcast queue
if join is deferred, send via bcast
On join completion:
successful join
set mcast->ah
send all queued packets via mcast
if no queued packets, alloc neigh for default ipv4 ethertype
on failed join
mcast->ah remains NULL
send all queued packets via bcast
mcast->delay_until is set to future time (used to know join is deferred)
schedule deferred join attemp
--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD
signature.asc
Description: OpenPGP digital signature
