James Carlson wrote:
Rao Shoaib writes:
Folks today is the last day to submit your design review comments. If you need more time please let us know so we can account for that in our schedule.

Late, as always, but since you asked for feedback on the design ...

No problem Jim we appreciate your comments. Our answers and comments are inline.
General:

  I suggest using sys/list.h instead of rolling your own linked list
  structures.  Using the common list functions makes debug and usage a
  lot simpler.

  Tiny nit: s/its'/its/g
Yes this has already been changed in the code.
Section 2.2 says:

  "An up call/down call should never result in a up/down call in the
   same thread context."

  I'm a little unclear on the 'should' part of this statement.  Is it
  intended to mean 'shall' or are there exceptions to the statement?

  I don't think I see cases where this ought to occur as a matter of
  design, so am I missing something?
Actually this statement is not correct, There are two cases where an upcall can result in a downcall.
1) In case of fallback
2) If the protocol gets flow controlled on the send, the thread making the downcall makes the upcall to set the TxQ full flag.
Section 2.6:

  Nit: "Evolving" is no longer a stability level.
We will remove it.
Section 2.7:

  In addition to "filtering," I expect that there will be layered
  modules, so we should be planning for them.  "Sockets Direct" would
  be one example of a layered model.  NL7C and SSL are probably other
  examples of these.  (The actual implementation probably looks a lot
  like "filtering," so perhaps this is just a nit.)
What terminology should we use here instead.
Section 3.2.1:

  "Sockmod" is already a known (pseudo-)module.  I suggest using a
  different name.
We will change
Section 3.2.1:

  "If a module name is present, checks are made to ensure that the
   module exists and can be loaded."

  Given that the rest of the design supports dynamic loading, I
  suggest considering not doing this test here, but instead do it when
  a socket is opened, and the module is actually needed.  That way,
  designers could put new socket types in "/usr/kernel/sockmods" and
  avoid burdening the root file system with more modules.  (I suspect
  you already have to do that dynamic existence check *anyway*, as
  it's part of dynamic loading.)
Isn't usr already mounted when soconfig runs
Section 3.2.1:

  "A module entry takes precedence over the device entry"

  What does this mean?  Can you give an example where both would be
  present?  It sounds to me like this is covering for something like
  this:

        2 2 0 /dev/tcp
        2 2 0 - tcp

  but if the user has something that confused in his configuration
  file, then why not just print an error instead?  Couldn't he just as
  easily misconfigure it like this?

        2 2 0 - foo
        2 2 0 - tcp

  Why allow an entry to be specified with both a device and a module?
Based on Meem's comments we are fixing this, there can be only one entry, module or device.
Section 3.4.1:

  Depending on how many of these there are, and how frequent
  create/destroy is done, a modhash might be useful.
We will file an RFE.
Section 3.6:

  Once unload has actually started to run, I suspect that something
  will need to prevent new calls to the module from occurring until it
  can be loaded again.  Unload should start the _fini process, and at
  some point (particularly after mod_remove has been called), this
  becomes destructive.

  How does TCP's time-wait state work here?  It needs to hang around
  for a while after the socket is actually closed.  What stops the
  module from unloading in this case?  (Does the transport have to
  handle this?)
We had not thought about this. Currently tcp and sockmod for tcp are separate modules so we only unload the socket module. However they could be same so we need to handle this situation. We will handle that by the protocol returning EAGAIN or EBUSY. In case of EAGAIN unload will be tried at a later time. EBUSY would mean that the module is busy.
Section 4.4.3:

  What context is required or provided when the downcalls are made?
  Is it always a process context, or could it be a kernel thread or
  even an interrupt?
It can never be an interrupt conetxt, It will be either process or kernel thread context.
  Why is the opaque handle defined using "void *"?  Why not use an
  incomplete structure?  Using a 'void *' is unclean -- it wipes out
  most of the static checking that the compiler and lint can perform
  for you.
Can you elaborate what you mean, we all seem to interpret this differently.
  What is a "sock_upper_handle_t"?  It's not defined by this point in
  the text.  I assume it's the caller's private pointer used for
  callbacks, but, if so, why does it have a typedef here?
Will fix.
Section 4.4.3.2:

  What's the state after an error is returned?  Has the connection
  been taken off of the queue and closed?  Or is it left on the queue
  but in an error state?
It is taken off of the queue and is freed.
Sections 4.4.3.4 and 4.4.3.7:

  sd_unbind and sd_disconnect don't seem to map into distinct socket
  operations; when and why are they used?
They are going away.
Section 4.4.3.13:

  How do socket types that require synchronous behavior on
  read(2)/write(2) work (such as with PF_ROUTE)?
write(2) is blocked in the protocol.  Nothing special is done for read(2).
  I don't think that using an upcall to assert flow control (that is,
  to stop the sender) is the right way to handle the problem.  sd_send
  should have an error that it can return if the transport must assert
  flow control, and can then call upward when resumption is possible.

  I don't think an upcall to stop flow is a good idea because it
  potentially allows for priority-related problems where the sender
  pounds data down into a transport layer that never gets a chance to
  say "uncle."
This mechanism allows us to not copy data if the protocol is flow controlled, It also allows us to co-ordinate blocking and waking up of a thread. We can address your comment by allowing the protocol to return EAGAIN and have the socket code block or return the error to the user. BTW this will be a change in behavior from current Solaris implementation.
Section 4.4.3.16:

  What's the difference between *rvalp and the actual return value
  from sd_ioctl?
The return value is zero or errno.
Section 4.4.3.17:

  Does sd_close block?  If not, then how does the user request a
  blocking close or wait for close to complete?
It blocks.
Section 4.4.4:

  The text says "although not recommended" many times in reference to
  locks.  Is it possible to make a definite design-level statement
  about the holding of locks across the API calls?  I'd strongly
  prefer to have the text say "must not."

  It's always easier to make things more generous in the future than
  it is to try to tighten up the interface when we discover that we've
  made a mistake.

  Don't be generous in version 1.0.
We have to hold locks or be in the squeue for example to maintain ordering of packets.
  Same issue with "void *" for handle here as with downcall handle; if
  there's an actual structure tag that could be used, then use it.
  Otherwise, the typedef looks gratuitous.
OK.
  In what context can these calls be made?  Can any of them be made in
  interrupt context?
Yes they can be, for example when queuing a packet.
Section 4.4.4.3:

  I don't understand the non-zero return value for su_disconnected.
  It says that it's "safe" to release the control block at that
  point.  _Must_ the transport behave as though sd_close had been
  called at this point, or is it still possible for the upper level to
  call sd_close on that same socket?

The main reason is SCTP, which needs it when dealing with associations. Associations are removed as soon as they are disconnected. For other connection oriented protocols this could be used to get rid of sockets that are disconnected before they are pulled off of the accept queue, but that would just be an optimization. And yes, if a non-zero value is returned a sd_close() would not happen, so the transport would have behave like a sd_close() was received. But we need to think about this a bit more, we might just say that zero will always be returned (for version 1).

  The text is too weak; it should be certain about the behavior.

  Why is it necessary to provide a way for su_disconnected to signal
  that the "control block" should be released?

Section 4.4.4.4:

  I'm not sure what this does.  Much more information is needed here
  about the effect of each one of the nine different combinations of
  'op' and 'action'.

Section 4.4.4.5:

  What flags are defined?  Would MSG_OOB be one of them?
flags is currently unused. We could use it for queuing OOB data as you have suggested.
Suggest uint_t for use with flags.

  Return value looks like size_t.
yes will fix.
Section 4.4.4.7:

  This isn't symmetric with the read side, and it should be.  On the
  read side, there's just a downcall to say "more space is now
  available."  The same should be true for the send side.
On the read side we return the amount of space left in the recv buffer, We are hoping that the protocol will do something intelligent with that info, like adjust it's window size etc.

For send we either accept the whole packet or accept the whole packet so it seems unnecessary to return the amount of space left.
  I don't agree with having transmit flow control asserted by an
  upcall, for the reasons described for 4.4.3.13.

Section 4.4.4.8:

  Nit: looks like size_t to me.
Yes will fix.
Section 4.4.4.9:

  What's the reference point for 'offset'?
It's the offset in the current segment.

Section 4.4.4.10:

  Why is this a separate function pointer, instead of just being a
  flag on the regular function call?
Sure we can make that change.
Section 4.4.4.11:

  What does this do?  What are "transient errors," who receives them,
  and when?  (I assume this is for handling ICMP errors in UDP ... but
  more about the behavior of the interface is needed.)

Your guess is correct, it sets so_error in the socket that is consumed by the next operation.
Section 8.2.1:

  Why does ksock_cb_flags need to exist?
So that we can update individual callbacks.
Section 8.4:

  Is there anything in principle that would prevent this from being
  usable to implement AF_UNIX?
Only because AF_UNIX socket implementation is TPI based, we do not support TPI based kernel sockets

Thanks a lot for your time

Rao.

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to