Anders Persson wrote:
Hi Erik,

Thank you for your comments. My comments are inline.

On Tue, Dec 08, 2009 at 07:03:54AM -0800, Erik Nordmark wrote:
Section 2.1 says you manage the filters using SMF. How does that interact with zones? For instance, if a shared-IP or exclusive-IP zone disables one of the service instances, what happens? Or do the filter service instances only exist in the global zone somehow? (Having them exist in non-global zones but have no effect on the filters used would seem confusing.)

The service would only exist in the global zone.

I didn't think we had a mechanism to do that.
How do you do it?


Section 7.3 seems dangerous. If kssl is implemented using filters and some application issues some innocuous I_* ioctl to get something, that would make kssl disappear from the socket? I think instead the ioctl (or other reason for the fallback) must fail.

If the specific I_* ioctl require the socket to fallback for the operation to
complete, then yes, the filter would disappear. I considered denying the
STREAMS operation, but I thought forcing the ioctl to fail could lead to some
interesting application failures. However, looking at how kssl used to work on
STREAMS sockets it looks like it would stick around even if sockmod is pop'ed,
which could result in the application seeing bogus TLI messages. But again,
that's probably a much less likely scenario than an I_SETSIG being issued on a
non-STREAMS socket, resulting in a fallback.

So could you expand a bit on what you see as being dangerous?

Today some r"ead-only" I_* ioctl that do not modify the state of a stream (such as an I_GETSIG or an I_STR wrapper containing a SIOCGLIFFLAGS) that cause a socket to fall back.

What is the impact of issuing such ioctls work on a kssl socket today? With kssl as a socket filter it would cause the socket to stop functioning if the effect is to remove the filter.

Section 8. Do we have interested parties outside of the consolidation that would like to use socket filters? What would it take to make this an evolving interface?

I'm not sure how much work would be involved to bump the level to Uncommitted,
but my guess would be "not much". That being said, I would like to file a
follow-up case to make such changes.

Section 8.1: What are the semantics of sof_unregister for a filter that is in use? Will it fail with EBUSY? Something else?

Yup, EBUSY will be returned as long as it's in use.

Section 8.2.1 If there are multiple filters and one returns SOF_RVAL_DEFER, do the filters above it get an attach callback?

Yes, the attach callback for each filter will be triggered, and multiple
filters can return DEFER.

Makes sense to add that to the document.

Section 8.2.3, 8.2.4 etc says the filter can modify the mblk. In light of db_ref considerations it would be better to say that it can return a different mblk (with in-sito modifications of the dblk allowed if db_ref = 1). Instead of having a mblk_t ** it seems to be less error prone to have an mblk_t pointer as the return value. That makes it less likely to introduce errors where the mblk is freed or changed, but the *mpp isn't updated. (I've fixed this in IP recently.)

OK, I'll make the clarification and change the return value of data_in to
mblk_t *. It might be possible to simplify data_in_proc as well, see the
comment regarding 'tailmp' below.

What is the definition of *sizep? It is what I get from msgdsize()? msgsize()? (Can there be mblks other than M_DATA?)

`sizep' is total amount of data (from msgdsize()) contained in the chain. And
there can be non-M_DATA mblks, for example, M_PROTO with T_UNITDATA_IND,
T_OPTDATA_IND, etc.

Please add that to the document.

Section 8.2.4. Why do we need the complexity of tailmp?
Why do we even need the complexity of a b_next chain? Doesn't TCP just have b_cont chains?

The `tailmp' is just an optimization; why chase the tail if the filter knows
where it is. But it is possible for the mblk chain to be large, so the search
could potential be expensive. It might be a premature optimization, so I'll run
some benchmarks to see if it can be simplified.
In case of UDP there might be a b_next chain of datagrams, and for TCP we might
end up with T_OPTDATA_IND mblks that break the b_cont chain. The framework
could potentially break up the b_next chain and call the callback multiple
times, but that would require the framework to also calculate the size of each
b_cont chain (the size information available is that of the whole b_next
chain).

UDP doesn't send up b_next chains today. What would make UDP do that in the future?

TCP doesn't use it today either, but I can see that if TCP still queues data before a new connection has been accepted, then TCP might want to send up a chain that contains some M_DATA, some T_EXDATA_IND, and perhaps even T_OPTDATA_IND. However, I don't think it is worth optimizing the performance around that since it is very rare to have anything but M_DATA, and also it might make sense to have TCP send up data towards sockfs as soon as su_newconn has returned the upper handle. (Thus TCP would only need to queue data for TPI sockets.)

Section 8.2.5. In light of the above comments I think an mblk_t * return value type makes more sense, and have the error return value as a separate out argument. When can the filter free and/or consume an mblk? For instance, if the filter wants to queue the mblk would it return a NULL mblk and set RVAL_RETURN? Must the filter free the message when returning an error?

Sure, having a mblk_t * return value would be OK.

The filter must return a NULL mblk if it set rval to anything other that
RVAL_CONTINUE, so in case of error it should free the mblk. I'll update the
document to make sure that is clear.

OK

   Erik
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org

Reply via email to