On 5/13/13 2:44 PM, Paul LeoNerd Evans wrote:
On Mon, 13 May 2013 11:23:45 -0700
Adrian Chadd <adr...@freebsd.org> wrote:

Just as a data point, I managed 50,000 + connections, at 5,000 + a
second, doing a gigabit + of traffic, mid-2000s, with the userland
management of all of the socket/disk FD stuff.

The biggest overhead at the time was actually the read/write
copyin/copyout, NOT the locking overhead of managing this stuff. Why?
Because I architected the HTTP side of things to specifically pin FDs
to threads, and not allow arbitrary threads to deal with arbitrary
FDs. This removed the need for almost all of the state locking that
people are concerned about here.
I think then this comes from different experiences.

I'm guessing this application was:

   a) Written in C
   b) Entirely filled with identically-typed identical-purpose file
      descriptors
   c) Didn't really use any EV_ONESHOT events
   d) Didn't close sockets apart from when it received EOF
and perhaps most importantly
   e) Was entirely self-contained - did everything from one unified
      block of source code.

I.e. a very simple set of semantics. I'll explain the situation that I
had.

The reason I ran into the problem needing EV_DROPWATCH/EV_DROPPED was
because I was trying to fix Perl's IO::KQueue.

IO::KQueue tries to wrap kqueue/kevent for Perl, allowing the userland
Perl code to store an arbitrary Perl data pointer in the udata field.
This data is reference-counted. Userland might let the kernel store the
only copy of that data, because it comes back in event notifications
anyway. Because of this, the reference count has to be artificially
incremented to account for the extra pointer in the kernel. Without
knowing when the kernel will decide to drop that pointer, I never know
when I should decrement the refcount myself.

It has no knowledge of what userland is doing with this. It can't know
when userland might be EV_ONESHOT'ing. It doesn't really know what
events will be oneshot anyway (such as the process exit watches).
Finally, it has no idea what other modules are going to call close() on
it. This final problem was the real killer - while the first two
-could- be worked around with more complex code structures, not knowing
what other CPAN modules will ever call close() makes it impossible to
handle. Simply asking every CPAN module to "please just call fd_close()
instead of close()" doesn't work here.

As compared: having the kernel tell userland when it calls knote_drop()
is much simpler. It knows exactly when it is doing this, so simply
pushing an event up to userland to tell it it did so is simple. If any
more cases than the three known (EV_ONESHOT or other single-shot events;
EV_DELETE, close()) are added, userland - and in particular, the
IO::KQueue module, will not need updating. It will continue to
decrement refcounts and free data perfectly happily when kernel has
dropped the watch.

I've used this pattern before in C libraries + higher-level language
wrappers, and found it to be nicely simple to both implement and use.
Because it follows the -same- event notification path that userland is
already using, it manages to avoid quite a number of the
race-conditions that a secondary, separate data structure and locking
often runs into; e.g. if userland is trying to add a new thing into it
just at the time there's a notification "in-flight" from the kernel
about an old thing that it used to have.

Principly - the fact that kernel tells -userland- about the delete,
means that it can atomically *guarantee* that this *will* be the last
event about this particular item. Userland must not delete its own data
structure about it until this notification happens. If it does this,
lots of semantics become a lot simpler.

I was responsible for the u_data field. It was not in the original design that was proposed and I suggested it to Jonathan. I was thinking purely of a simple way for an event to supply added information to its handler that would obviate the need for the app to keep complicated tracking structures. I was not thinking in terms of "badly behaved" (sic) third party high level ops using it through a language binding.
I admit that I did not think about the close issue at that time.

Your suggested changes are not unreasonable however we could do with more
discussion. The point about tracking objects that may be arbitrarily destroyed without the framework being notified is valid and aligns well with general robustness principals.

I would suggest that one answer would be to create an extension to register a
kevent to catch these events..

(the knote_drop())

The returned event could have all the appropriate information for the event 
being dropped..



_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Reply via email to