[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

Darren Reed Wed, 12 Aug 2009 14:58:12 -0700

On 12/08/09 02:14 PM, Sebastien Roy wrote:
> On Wed, 2009-08-12 at 13:54 -0700, Darren Reed wrote:
>   
>> On 12/08/09 06:37 AM, Sebastien Roy wrote: 
>>     
>>> Darren,
>>>
>>> On Mon, 2009-07-27 at 19:01 -0700, Darren Reed wrote:
>>>       
>>>> http://cr.opensolaris.org/~darrenr/onnv-pcapture/
>>>>
>>>>         
>>> I reviewed the changes to ipnet and ip:
>>>
>>> General: There are quite a bit of changes related to the style of
>>> local variable declaration that was changed, thus adding to the volume
>>> of the review.  In general, it would be preferable to restrict the set
>>> of changes to the functionality being added, and to cstyle and lint
>>> issues found while making these changes.  Otherwise, sticking with the
>>> existing style of the file without making sweeping changes to style
>>> based on personal preference should be the modus operandi.  I won't
>>> comment on each instance, I've restricted my specific comments to
>>> things not related to local variable declarations. :-)
>>>       
>> As we discussed, i want there to be *1* style for the file.
>> I have no interest in which style that is. I just don't like
>> working on source code with mixed styles because it isn't
>> clear which one should be used going forward...
>>
>> Is it to be:
>> <type><tab><variable>
>> or
>> <type><n tabs><variables> 
>> .. where all the variables are aligned and an arbitrary number
>> of tabs are inserted.
>>
>> The difference is (what I've currently done):
>>
>> struct ip        ip;
>> int     foo;
>>
>> one tab between type and name, vs:
>>
>> struct ip        ip
>> int              foo;
>>
>> where short type names have extra tabs.
>>
>> At present, compare _init()'s variables (space between type and
>> variable name) with those in ipnet_if_init().
>> _init()'s use the latter of the above - multiple tabs to bring
>> variables out to the same position, which is different to what
>> I interpreted your earlier suggestion (single tab) as being.
>>     
>
> All variable names should be aligned at a tab boundary, and initialized
> in the declaration where possible.  That's the style used throughout the
> file.  There may be a couple of exceptions (i.e. mistakes), but that's
> not reason enough to change the entire file to use an entirely new
> style.  If there's a function that isn't using this style, then let's
> fix it rather than fix every other function.  e.g.
>
>       netstack_t      *ns = netstack_find_by_zoneid(ifp->if_zoneid);
>       ip_stack_t      *ipst = ns->netstack_ip;
>       ipnet_t         *ipnet;
>       char            name[32];
>       int             error = 0;
>


Not a problem.


>>> uts/common/inet/ipnet.h
>>>
>>> * 58: This may need to be atomic (atomic_inc_64())
>>>       
>> Discuss.
>> In a similar vein to many of the counters in IP not being
>> atomic increments, so to is this. There's a real cost for
>> using atomic_() ops and it seems that unless there is an
>> absolute need for it to be 100% reliable (reference count,
>> etc), the "danger" in just using "++" for general statistics
>> seems to be accepted. Compare with MIB_BUMP() and
>> other _BUMP() macros in $SRC/uts/common/inet/*.h.
>>     
>
> I would think that the cost of the atomic_inc_*() APIs are quite minimal
> (and they're relatively widely used in IP),

No, they're not as inexpensive as we might think, especially on multi
CPU systems. There was an internal talk on the cost of mutex's and
such earlier in the year: a much smaller number of them can be done
than we might expect (the other point made from that talk was to
avoid locking whenever possible because of their impact on the bus.)



>>> * 1168: What's the purpose of this addition?  This function only gets
>>>   called for ipnet_t streams that are on the /dev/lo0 device.
>>>       
>> Discuss.
>> ipnet_loaccept() is used twice: once in ipnet_open() and once in
>> ipnet_promisc_add().
>>     
>
> I don't follow.  ipnet_acceptfn should only be set to ipnet_loaccept()
> if /dev/lo0 is opened, and in no other case.  Therefore the check you
> added in the function itself: "if (getminor(ipnet->ipnet_if->if_dev) ==
> IPNET_MINOR_LO)" is redundant.  If ipnet_loaccept() is getting called
> when it shouldn't be, then something else is wrong.
>   

So lets examine ipnet_accept vs ipnet_loaccept for the purpose of
lo0 packets going through ipnet, into BPF, in a local zone with a
shared stack.

If we set ipnet_acceptfn to ipnet_accept for lo0 packets that are
going to end up in BPF:
- the IP address of the packet isn't important, so what address type
  it is does not matter, thus calls to ipnet_get_addrtype() are not
  required;
- the lo0 interface has no special behaviour for promiscuous mode,
  so that flag and the SAP one are meaningless;
- thus there's lots of code in that function that gets executed for
  seemingly no reason.

The goal of using BPF in a local zone with a shared stack is to
see all of the traffic in that zone which is associated with the
lo0 "interface". The ipnet_loaccept function provides a filter
that does exactly that - except for the filter on the hook type.
I suppose i could add an ipnet_bpf_loaccept but it would be
a carbon copy of ipnet_loaccept with that check removed.

I think what I'm saying is that BPF on lo0 in a zone has the
same semantics as /dev/lo0 and not /dev/ipnet/lo0.


>>> * 1959: What is the reason for this callback having to duplicate every
>>>  packet passed to it?  Why wouldn't the code that dispatches packets
>>>  to callbacks always simply pass a duplicate?  An observability hook
>>>  should always have those semantics (it should never manipulate the
>>>  original or block while processing it), so hard-coding such
>>>  semantics for all NH_OBSERVE hooks would make sense to me.
>>>       
>> Discuss.
>> The duplication here has been moved from ip.c`ipobs_hook() to here.
>> Whereas ipobs_hook() was previously just walking a list of functions
>> to call and doing the duplication there, as required, now the list
>> walking is separated (pfhooks) from the packet duplication. Thus the
>> dispatching comes out of pfhooks and jumps through ipobs_bounce_func_cb()
>> to do the duplication for the real recipient (func).
>>     
>
> Ah, I see.
>
>   
>> But, if your
>> comments above are correct, then dupmsg() should be removed and only
>> copymsg() used, otherwise the observe is presented with the real
>> packet data (in dblk_t's off of its own mblk_t.) There is a performance
>> cost with this, though: it enforces a deep copy of all observed packets,
>> copying all of the dblk_t's, not just a shallow one (creating new mblk_t's.)
>> The current implementation only does a deep copy if the shallow one
>> fails.
>>     
>
> Okay.
>   

Just to confirm, it's ok to leave the code to do both dupmsg() and copymsg()
for performance reasons.


>>> * 2203-2232: datalink_id_t is not the same as IP interface index.
>>>   They don't represent the same objects, and they don't exist at the
>>>   same layer.  if_index is an IP interface index, and a datalink_id_t
>>>   is a linkid as handled in the GLDv3 framework.  This code has
>>>   nothing to do with GLDv3, and is strictly contained at the IP-layer,
>>>   so it shouldn't be handling datalink_id_t.
>>>       
>> Discuss.
>> The index used is the index from IP, but seeing as there is no
>> ipindex_t, I used datalink_id_t to put the index in. It would
>> seem that using datalink_id_t as a type here is cause for confusion
>> so it might be better to just use uint_t.
>>     
>
> Yep, I think so.
>   

Ok, I'll fix that.

Darren

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090812/dacb2c87/attachment.html>

[crossbow-discuss] [networking-discuss] BPF/PF_PACKET Code review

Reply via email to