On Sun, Nov 16, 2008 at 09:24:19PM +0000, Eris Discordia wrote:
That isn't happening. All we have is one TCP connection and one small
program exporting file service.
I see. But then, is it the "small program exporting file service" that
does the multiplexing? I mean, if two machines import a gateway's /net
and both run HTTP servers binding to and listening on *:80 what takes
care of which packet belongs to which HTTP server?
I don't think you've quite got it yet.... also I swore I wouldn't post in
this thread. Oh well, here goes.
First, let me draw a picture, just to introduce the characters:
+----------+ +---------+
| Internal |<--Ethernet-->| Gateway |<--Ethernet-->(Internet)
| Computer | ^ +---------+
+----------+ |
|
+----------+ |
| Other |<--+
| Internal |
| Computer |
+----------+
OK. Here, we have two internal computers (IC and OIC) and a gateway G.
There are two Ethernet networks in flight, one connecting IC, OIC, and G,
and the other connecting G to the internet at large, somehow (e.g. ADSL).
IC and OIC both initialize somehow, say using DHCP from G, and bring their
network stacks up using this information. Their kernels now provide the
services that will be mounted, by convention, at /net, again using this
private information. G initializes statically, bringing its IP stack up
using two interfaces, with two routes: one for internal traffic, and one
default route.
So far, it's all very similar between Plan 9 and Linux. Here's where
our story diverges.
In Linux, IC and OIC are both taught that they have default routes to the
Internet At Large by sending IP datagrams to G's internal ethernet
interface. How this works in more detail, they don't care. G will also
send them corresponding IP datagrams from its interface; that's all they
care about. G works furiously to decode the network traffic bound for the
internet and NA(P)T it out to its gateway, maintaining very detailed
tables about TCP connections, UDP datagrams, etc, etc. How could it be
but any other way?
Let's redraw the picture, as things stand now, under Plan 9, just so
things are clear:
(OIC similar to IC)
+----Internal Computer--------------------+ |
| bind '#I' /net | |
| # I : Kernel IP stack (192.168.1.2) | |
| # l : Kernel ethernet driver |<---Ethernet---+
+-----------------------------------------+ |
|
+----Gateway------------------------------+ |
| bind '#I' /net | |
| # I : Kernel IP stack (192.168.1.1) | |
| (4.2.2.2) | |
| # l : Kernel ethernet driver (ether0) |<---Ethernet---+
| (ether1) |<---Ethernet----->(Internet)
+-----------------------------------------+
In Plan 9, G behaves like any other machine, building a /net out of pieces
exported by its kernel, including the bits that know how to reach the
internet at large through the appropriate interface. Good so far?
Let's have G run an exportfs, exposing its /net on the internal IP
address. This /net knows how to talk to the internal addresses and the
external ones.
Meanwhile, IC can reach out and import G's /net, binding it at /net.alt,
let's say. Now, programs can talk to the Internet by opening files in
/net.alt. These open requests will be carried by IC's mount driver, and
then IC's network stack, to G, whereupon the exportfs (in G's userland)
will forward them to its idea of /net (by open()ing, read()ing,
write()ing, etc.), which is the one built on G's kernel, which knows how
to reach the Internet. Tada! Picture time:
(OIC)
+----Internal Computer-------------------------+ |
| abaco: open /net.alt/tcp/clone | |
| | |
| import tcp!192.168.1.1!9fs /net.alt (devmnt) | |
| bind '#I' /net | |
| # I : Kernel IP stack (192.168.1.2) | |
| # l : Kernel ethernet driver |<---------+
+----------------------------------------------+ |
|
+----Gateway------------------------------+ |
| exportfs -a -r /net | |
| | |
| bind '#I' /net | |
| # I : Kernel IP stack (192.168.1.1) | |
| (4.2.2.2) | |
| # l : Kernel ethernet driver (ether0) |<---Ethernet---+
| (ether1) |<---Ethernet----->(Internet)
+-----------------------------------------+
This works perfectly for making connections: IC's IP stack is aware only
of devmnt requests, and G's IP stack is aware of some trafic to&from a
normal process called exportfs, and that that process happens to be
making network requests via #I bound at /net.
The beauty of this design is just how well it works, everywhere, for
everything you'd ever want.
Now, suppose IC goes to listen on TCP:80, by opening /net.alt/tcp/clone.
The same flow of events happen, and to a certain extent, G's network stack
thinks that the exportfs program (running on G) is listening on TCP:80.
exportfs dutifully copies the /net data back to its client.
Naturally, if another program on G were already listening on TCP:80, or
the same program (possibly exportfs) attempted to listen twice (if, say,
OIC played the same game and also tried to listen on G's TCP:80), it
would be told that the port was busy. This error would be carried back
along the exportfs path just as any other.
So as you see, there is no need to take care of "which packet belongs to
which server" since there can, of course, be only one server listening on
TCP:80. That server is running on G, and behaves like any other program.
That it just so happens to be an exportfs program, relaying data to and
from another computer, is immaterial.
This also works for FTP, and UDP, and ESP (which are notorious problems in
the NAT world), and for IL, and for IPv6, and for ethernet frames (!), and
... you get the idea. It does this with no special tools, no complex
code, and a very minimal per-connection overhead (just the IC and G
kernels and G's exportfs tracking a file descriptor).
There are no connection tracking tables anywhere in this design. There
are just normal IP requests over normal ethernet frames, and a few more
TCP (or IL) connections transporting 9P data.
On a UNIX clone, or on Windows, because there is exactly one TCP/IP
protocol stack in usual setups no two programs can bind to the same port
at the same time. I thought Plan 9's approach eliminated that by keeping
a distinct instance of the stack for each imported /net.
There can, in fact, be multiple IP stacks on a Plan 9 box, bound to
multiple Ethernet cards, as you claim. (In fact, one can import another
computer's ethernet cards and run snoopy, or build a network stack, using
those instead.) I don't think that's relevant to the example at hand,
though, as should be clear from the above.
--nwf;