Re: [networking-discuss] Socket filter design

Erik Nordmark Tue, 15 Dec 2009 02:42:14 -0800

Anders Persson wrote:

Hi Erik,
Thank you for your comments. My comments are inline.

On Tue, Dec 08, 2009 at 07:03:54AM -0800, Erik Nordmark wrote:
Section 2.1 says you manage the filters using SMF. How does thatinteract with zones? For instance, if a shared-IP or exclusive-IP zonedisables one of the service instances, what happens?Or do the filter service instances only exist in the global zonesomehow? (Having them exist in non-global zones but have no effect onthe filters used would seem confusing.)
The service would only exist in the global zone.


I didn't think we had a mechanism to do that.
How do you do it?

Section 7.3 seems dangerous. If kssl is implemented using filters andsome application issues some innocuous I_* ioctl to get something, thatwould make kssl disappear from the socket? I think instead the ioctl (orother reason for the fallback) must fail.


If the specific I_* ioctl require the socket to fallback for the operation to
complete, then yes, the filter would disappear. I considered denying the
STREAMS operation, but I thought forcing the ioctl to fail could lead to some
interesting application failures. However, looking at how kssl used to work on
STREAMS sockets it looks like it would stick around even if sockmod is pop'ed,
which could result in the application seeing bogus TLI messages. But again,
that's probably a much less likely scenario than an I_SETSIG being issued on a
non-STREAMS socket, resulting in a fallback.

So could you expand a bit on what you see as being dangerous?

Today some r"ead-only" I_* ioctl that do not modify the state of astream (such as an I_GETSIG or an I_STR wrapper containing aSIOCGLIFFLAGS) that cause a socket to fall back.

What is the impact of issuing such ioctls work on a kssl socket today?With kssl as a socket filter it would cause the socket to stopfunctioning if the effect is to remove the filter.

Section 8. Do we have interested parties outside of the consolidationthat would like to use socket filters? What would it take to make thisan evolving interface?
I'm not sure how much work would be involved to bump the level to Uncommitted,
but my guess would be "not much". That being said, I would like to file a
follow-up case to make such changes.
Section 8.1: What are the semantics of sof_unregister for a filter thatis in use? Will it fail with EBUSY? Something else?
Yup, EBUSY will be returned as long as it's in use.
Section 8.2.1 If there are multiple filters and one returnsSOF_RVAL_DEFER, do the filters above it get an attach callback?
Yes, the attach callback for each filter will be triggered, and multiple
filters can return DEFER.


Makes sense to add that to the document.

Section 8.2.3, 8.2.4 etc says the filter can modify the mblk. In lightof db_ref considerations it would be better to say that it can return adifferent mblk (with in-sito modifications of the dblk allowed if db_ref= 1). Instead of having a mblk_t ** it seems to be less error prone tohave an mblk_t pointer as the return value. That makes it less likely tointroduce errors where the mblk is freed or changed, but the *mpp isn'tupdated. (I've fixed this in IP recently.)
OK, I'll make the clarification and change the return value of data_in to
mblk_t *. It might be possible to simplify data_in_proc as well, see the
comment regarding 'tailmp' below.
What is the definition of *sizep? It is what I get from msgdsize()?msgsize()? (Can there be mblks other than M_DATA?)
`sizep' is total amount of data (from msgdsize()) contained in the chain. And
there can be non-M_DATA mblks, for example, M_PROTO with T_UNITDATA_IND,
T_OPTDATA_IND, etc.


Please add that to the document.

Section 8.2.4. Why do we need the complexity of tailmp?
Why do we even need the complexity of a b_next chain? Doesn't TCP justhave b_cont chains?


The `tailmp' is just an optimization; why chase the tail if the filter knows
where it is. But it is possible for the mblk chain to be large, so the search
could potential be expensive. It might be a premature optimization, so I'll run

some benchmarks to see if it can be simplified.

In case of UDP there might be a b_next chain of datagrams, and for TCP we might
end up with T_OPTDATA_IND mblks that break the b_cont chain. The framework
could potentially break up the b_next chain and call the callback multiple
times, but that would require the framework to also calculate the size of each
b_cont chain (the size information available is that of the whole b_next

chain).

UDP doesn't send up b_next chains today. What would make UDP do that inthe future?

TCP doesn't use it today either, but I can see that if TCP still queuesdata before a new connection has been accepted, then TCP might want tosend up a chain that contains some M_DATA, some T_EXDATA_IND, andperhaps even T_OPTDATA_IND. However, I don't think it is worthoptimizing the performance around that since it is very rare to haveanything but M_DATA, and also it might make sense to have TCP send updata towards sockfs as soon as su_newconn has returned the upper handle.(Thus TCP would only need to queue data for TPI sockets.)

Section 8.2.5. In light of the above comments I think an mblk_t * returnvalue type makes more sense, and have the error return value as aseparate out argument. When can the filter free and/or consume an mblk?For instance, if the filter wants to queue the mblk would it return aNULL mblk and set RVAL_RETURN? Must the filter free the message whenreturning an error?
Sure, having a mblk_t * return value would be OK.

The filter must return a NULL mblk if it set rval to anything other that
RVAL_CONTINUE, so in case of error it should free the mblk. I'll update the
document to make sure that is clear.


OK

   Erik
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org

Re: [networking-discuss] Socket filter design

Reply via email to