Re: [networking-discuss] Socket filter design

Anders Persson Tue, 15 Dec 2009 22:58:47 -0800

----- Forwarded message from Anders Persson <anders.pers...@sun.com> -----


Date: Tue, 15 Dec 2009 11:51:43 -0800
From: Anders Persson <anders.pers...@sun.com>
To: Erik Nordmark <erik.nordm...@sun.com>
Cc: networking-discuss@opensolaris.org
Subject: Re: Re: [networking-discuss] Socket filter design

On Tue, Dec 15, 2009 at 02:41:29AM -0800, Erik Nordmark wrote:
> Anders Persson wrote:
>> Hi Erik,
>>
>> Thank you for your comments. My comments are inline.
>>
>> On Tue, Dec 08, 2009 at 07:03:54AM -0800, Erik Nordmark wrote:
>>> Section 2.1 says you manage the filters using SMF. How does that  
>>> interact with zones? For instance, if a shared-IP or exclusive-IP 
>>> zone disables one of the service instances, what happens?
>>> Or do the filter service instances only exist in the global zone  
>>> somehow? (Having them exist in non-global zones but have no effect on 
>>> the filters used would seem confusing.)
>>
>> The service would only exist in the global zone.
>
> I didn't think we had a mechanism to do that.
> How do you do it?

The mechanism I was looking at was the one that removes specific
packages before installation (pkgrm.lst).

>>> Section 7.3 seems dangerous. If kssl is implemented using filters and 
>>> some application issues some innocuous I_* ioctl to get something, 
>>> that would make kssl disappear from the socket? I think instead the 
>>> ioctl (or other reason for the fallback) must fail.
>>
>> If the specific I_* ioctl require the socket to fallback for the operation to
>> complete, then yes, the filter would disappear. I considered denying the
>> STREAMS operation, but I thought forcing the ioctl to fail could lead to some
>> interesting application failures. However, looking at how kssl used to work 
>> on
>> STREAMS sockets it looks like it would stick around even if sockmod is 
>> pop'ed,
>> which could result in the application seeing bogus TLI messages. But again,
>> that's probably a much less likely scenario than an I_SETSIG being issued on 
>> a
>> non-STREAMS socket, resulting in a fallback.
>>
>> So could you expand a bit on what you see as being dangerous?
>
> Today some r"ead-only" I_* ioctl that do not modify the state of a  
> stream (such as an I_GETSIG or an I_STR wrapper containing a  
> SIOCGLIFFLAGS) that cause a socket to fall back.
>
> What is the impact of issuing such ioctls work on a kssl socket today?  
> With kssl as a socket filter it would cause the socket to stop  
> functioning if the effect is to remove the filter.
>

Today kssl would continue to work, and as a filter it would be
disabled. But is application failure a suitable alternative? 

>>> Section 8.2.1 If there are multiple filters and one returns  
>>> SOF_RVAL_DEFER, do the filters above it get an attach callback?
>>
>> Yes, the attach callback for each filter will be triggered, and multiple
>> filters can return DEFER.
>
> Makes sense to add that to the document.
>

Will do.

>>> Section 8.2.3, 8.2.4 etc says the filter can modify the mblk. In 
>>> light of db_ref considerations it would be better to say that it can 
>>> return a different mblk (with in-sito modifications of the dblk 
>>> allowed if db_ref = 1). Instead of having a mblk_t ** it seems to be 
>>> less error prone to have an mblk_t pointer as the return value. That 
>>> makes it less likely to introduce errors where the mblk is freed or 
>>> changed, but the *mpp isn't updated. (I've fixed this in IP 
>>> recently.)
>>
>> OK, I'll make the clarification and change the return value of data_in to
>> mblk_t *. It might be possible to simplify data_in_proc as well, see the
>> comment regarding 'tailmp' below.
>>
>>> What is the definition of *sizep? It is what I get from msgdsize()?  
>>> msgsize()? (Can there be mblks other than M_DATA?)
>>
>> `sizep' is total amount of data (from msgdsize()) contained in the chain. And
>> there can be non-M_DATA mblks, for example, M_PROTO with T_UNITDATA_IND,
>> T_OPTDATA_IND, etc.
>
> Please add that to the document.
>

Will do.

>>> Section 8.2.4. Why do we need the complexity of tailmp?
>>> Why do we even need the complexity of a b_next chain? Doesn't TCP 
>>> just have b_cont chains?
>>
>> The `tailmp' is just an optimization; why chase the tail if the filter knows
>> where it is. But it is possible for the mblk chain to be large, so the search
>> could potential be expensive. It might be a premature optimization, so I'll 
>> run
>> some benchmarks to see if it can be simplified. 
>>
>> In case of UDP there might be a b_next chain of datagrams, and for TCP we 
>> might
>> end up with T_OPTDATA_IND mblks that break the b_cont chain. The framework
>> could potentially break up the b_next chain and call the callback multiple
>> times, but that would require the framework to also calculate the size of 
>> each
>> b_cont chain (the size information available is that of the whole b_next
>> chain). 
>
> UDP doesn't send up b_next chains today. What would make UDP do that in  
> the future?
>
> TCP doesn't use it today either, but I can see that if TCP still queues  
> data before a new connection has been accepted, then TCP might want to  
> send up a chain that contains some M_DATA, some T_EXDATA_IND, and  
> perhaps even T_OPTDATA_IND. However, I don't think it is worth  
> optimizing the performance around that since it is very rare to have  
> anything but M_DATA, and also it might make sense to have TCP send up  
> data towards sockfs as soon as su_newconn has returned the upper handle.  
> (Thus TCP would only need to queue data for TPI sockets.)
>

The missing piece of info here is that data_in_proc, which is called
from the context of the application doing the read, process the data
sitting in the socket queue. So if multiple datagrams are enqueued
between reads, then the callback will see a b_next chain of mblks.

Thanks,
Anders

----- End forwarded message -----

Anders
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org

Re: [networking-discuss] Socket filter design

Reply via email to