On 2/07/2012 4:47 PM, Sašo Kiselkov wrote:
>
> I'm not entirely clear what the b_rptr adjustment is for, so I only
> guessed how it's supposed to be done judging from the other code. What
> exactly is an mblk_t structure? (This is my first time hacking the
> kernel and the acronyms are somewhat cryptic to me...)

mblk_t is a message block. Message blocks were originally used in
Solaris to pass messages up and down STREAMS queues. That design
has since proved to be too slow for modern computing and whilst
STREAMS queues are now close to extinction in Solaris, the container
for packets (message blocks) remains. Their equivalent on Linux
is "struct sk_buff" and "struct mbuf" on BSD kernels. The main
difference is that the mblk_t actually has no data in it - it is
just a collection of pointers to data, including the data block
(dblk_t).

>>> To better deal with fanout of multicast traffic which might originate in
>>> professional IRDs and various other hardware appliances (which often
>>> stream everything from a single source addr+port combo and only
>>> differentiate streams by destination multicast address), I also
>>> implemented a new type of mac_fanout_type, MAC_FANOUT_SRC_DST, which
>>> does an XOR of the source and destination addresses in hash computation.
>>> That way, the default behavior of src-addr + src-port fanout is
>>> unchanged and the new behavior is selected only if the user wants it.
>>
>> Whilst I applaud the idea of a different fanout being available,
>> I think that the interface for selecting which one to use and
>> how it is selected needs more careful consideration.
>>
>> It would seem to me that in certain scenarios, it might be
>> sufficient (or even better) to use MAC_FANOUT_DST.
>
> Of course a per-link or per-flow tuning would probably be best, but that
> was above my skills to implement at this time (and as far as I'm aware,
> flow classification takes place after IP fanout).

That's ok as there's a somewhat larger amount of work that is
required here. As a for example, there's the question of why the
IPv6 flow identifier isn't used at all here.


>
> Well, as far as I'm concerned, I always wondered why it was
> "src_addr ^ src_port" to begin with, other than the author assuming that
> the bulk of all traffic being unicast (which naturally tends towards
> unique src-port tuples). If I had a say in that, I'd always advocate for
> "src_addr ^ src_port ^ dst_addr ^ dst_port" to give the maximum amount
> of entropy for the fanout and then have a per-link tunable (settable via
> dladm set-linkprop) to allow configuring other fanout strategies, like
> is possible for link-aggregation (L2, L3, L4, etc.).

At present the hash is on both the source address and the 32bit
representation of the source and destination ports (this is what
"whereptr" is for.) If you look at the definition of HASH_ADDR,
you'll see that all 32bits of the ports are used.

I'd be somewhat hesitant to use "src ^ dst" as for hosts on the
same network the hash will almost reduce to just the ports. It
might be better to do "(src + dst) ^ ports".

Darren



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to