Re: [networking-discuss] BPF/PF_PACKET Code review

Sebastien Roy Wed, 12 Aug 2009 18:21:36 -0700

On Wed, 2009-08-12 at 14:58 -0700, Darren Reed wrote:
> > > > * 1168: What's the purpose of this addition?  This function only gets
> > > >   called for ipnet_t streams that are on the /dev/lo0 device.
> > > >       
> > > Discuss.
> > > ipnet_loaccept() is used twice: once in ipnet_open() and once in
> > > ipnet_promisc_add().
> > >     
> > 
> > I don't follow.  ipnet_acceptfn should only be set to ipnet_loaccept()
> > if /dev/lo0 is opened, and in no other case.  Therefore the check you
> > added in the function itself: "if (getminor(ipnet->ipnet_if->if_dev) ==
> > IPNET_MINOR_LO)" is redundant.  If ipnet_loaccept() is getting called
> > when it shouldn't be, then something else is wrong.
> >   
> 
> So lets examine ipnet_accept vs ipnet_loaccept for the purpose of
> lo0 packets going through ipnet, into BPF, in a local zone with a
> shared stack.
> 
> If we set ipnet_acceptfn to ipnet_accept for lo0 packets that are
> going to end up in BPF:
> - the IP address of the packet isn't important, so what address type
>   it is does not matter, thus calls to ipnet_get_addrtype() are not
>   required;


The address is important.  A consumer of an ipnet device receives
packets if they contain an IP address assigned to that interface (this
is described in the IP Observability PSARC case).  That's the
distinction between /dev/lo0 and /dev/ipnet/lo0 (and thus the difference
between ipnet_accept() and ipnet_loaccept()).

> - the lo0 interface has no special behaviour for promiscuous mode,
>   so that flag and the SAP one are meaningless;
> - thus there's lots of code in that function that gets executed for
>   seemingly no reason.

ipnet_accept() implements the semantics of _all_ ipnet observability
nodes including /dev/ipnet/lo0.  ipnet_loaccept() only implements the
semantics of /dev/lo0 (which are deliberately different as described in
the PSARC case).  If you feel that ipnet_accept() does not perform well
enough for the special case of /dev/ipnet/lo0 in a shared stack zone,
then the answer isn't to break its semantic by using ipnet_loaccept(),
but rather to address its performance issues (if indeed there are any).

> The goal of using BPF in a local zone with a shared stack is to
> see all of the traffic in that zone which is associated with the
> lo0 "interface". The ipnet_loaccept function provides a filter
> that does exactly that - except for the filter on the hook type.
> I suppose i could add an ipnet_bpf_loaccept but it would be
> a carbon copy of ipnet_loaccept with that check removed.
> 
> I think what I'm saying is that BPF on lo0 in a zone has the
> same semantics as /dev/lo0 and not /dev/ipnet/lo0.

I don't understand.  I thought that there were different "levels" of
observability for BPF, where if one enabled DLT_IPNET, one would get the
ipnet semantics (/dev/ipnet/* for any device).  I would then assume that
for "lo0", if one does not use DLT_IPNET, then one gets the /dev/lo0
semantics, but if one enables DLT_IPNET, one gets /dev/ipnet/lo0
semantics.

In any case, we're currently talking about the setting of ipnet_loaccept
as ipnet_acceptfn, which definitely needs to be worked out for BPF, but
the comment I made was slightly different.  The code change you made in
ipnet_loaccept() itself seems to be tangential to that, and I still
don't know why you made that change.  Returning B_FALSE if /dev/lo0
wasn't the device opened still seems redundant regardless.

> > > > * 1959: What is the reason for this callback having to duplicate every
> > > >  packet passed to it?  Why wouldn't the code that dispatches packets
> > > >  to callbacks always simply pass a duplicate?  An observability hook
> > > >  should always have those semantics (it should never manipulate the
> > > >  original or block while processing it), so hard-coding such
> > > >  semantics for all NH_OBSERVE hooks would make sense to me.
> > > >       
> > > Discuss.
> > > The duplication here has been moved from ip.c`ipobs_hook() to here.
> > > Whereas ipobs_hook() was previously just walking a list of functions
> > > to call and doing the duplication there, as required, now the list
> > > walking is separated (pfhooks) from the packet duplication. Thus the
> > > dispatching comes out of pfhooks and jumps through ipobs_bounce_func_cb()
> > > to do the duplication for the real recipient (func).
> > >     
> > 
> > Ah, I see.
> > 
> >   
> > > But, if your
> > > comments above are correct, then dupmsg() should be removed and only
> > > copymsg() used, otherwise the observe is presented with the real
> > > packet data (in dblk_t's off of its own mblk_t.) There is a performance
> > > cost with this, though: it enforces a deep copy of all observed packets,
> > > copying all of the dblk_t's, not just a shallow one (creating new 
> > > mblk_t's.)
> > > The current implementation only does a deep copy if the shallow one
> > > fails.
> > >     
> > 
> > Okay.
> >   
> 
> Just to confirm, it's ok to leave the code to do both dupmsg() and
> copymsg()
> for performance reasons.

Sure, that's fine.

-Seb


_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org

Re: [networking-discuss] BPF/PF_PACKET Code review

Reply via email to