On 2016-12-14 at 15:43 Dan Cross <[email protected]> wrote: > to the guest and host. I think we'd get what you want, but at layer 2 > instead of 2/3/4.
Except we need stuff at the higher layers too. Two packets show up at Akaros, one for port 22 and another for 23. We also need to handle the case where the guest sends out an unsolicited packet, say to a webserver on port 80. We need to be able to trace how those packets and the responses flow through various parts of the stack. Also, how many processes are involved here, and how does it hook up to the guest? Somewhere, someone is reserving port 23 or whatever at layer 4 and receiving on it. It needs to know how to send it to the guest. It also needs to know how to catch any packet that comes from the guest and route it out the outbound IP stack in such a way that it receives the returning TCP connection. If there is Plan 9 software that does that and can easily be ported to Akaros, then great. Otherwise, I don't want to get involved in writing NAT software that mucks with TCP or whatnot. Maybe it's actually easy and I'm missing something. Tracing the flow of packets and which code is responsible for what will clear that up. Incidentally, the bypass solution is mostly layer 3, actually. The real work at layer 4 is done in the guest/app and in the remote machine. > As for the stacks.... The Plan 9 networking stuff we've inherited already > supports separate IP stacks (unless we stripped that out? I confess I > haven't looked) I've looked; it's there. Though there are two things in this area: multiple NICs per IP stack with different IPs (which we've done before, where the IP stack knows how to route them), and actual separate instances of #ip (differentiated by the spec). No one has tried that on Akaros. > How does bypass work for talking between > VMs on the same host? It *sounds* like we'd have to pop the frame out onto > the ethernet unless we intercepted it in the virtio layer, in which case > we've built half of a virtual layer 2 switch. That's not the use case here, just like with qemu's usermode networking, but you could do it. If the VMM was in 'real-mode' networking, then Guest A would need to connect to the ROUTER_IP port X (where X is forward to Guest B). The IP routing in the kernel will handle it. It'd be just like a guest connecting to a host service (which is a use case). If we wanted to do something fancier, we could, if we ran a DHCP server or whatever in the VMM. I don't think we need that. > Yes. I think what I'm suggesting is closer to the QEMU model on TUN/TAP. We > can possibly do the TUN/TAP-style much more cleanly than it must be done > under Linux. Linux has a few options for guest networking, one of which is the user-mode networking. I don't see why we can't have multiple ones too. > I'm not sure who has definitively said that you don't have a spare IP > address for the guest; there are any number of IPs in un-routable > private/sharable ranges that may be used for such things; if entirely > internal to a single host then who is to say one cannot arbitrarily use > one? Sure, there has got to be some local policy around this within an > *organization*, but that's a separate issue. We're conflating two different things. Yes if a network is virtualized, then you can do whatever you want. That's what qemu does with 10.0.2.2. That's also what I'd do with the 'real-addr' mode. It's all virtual, so I can tell the guest it has any addr I want - including the host addr! The comment about not having free IPs is for the *other* virtualization scenario: where guests have globally visible and routable IPs. This is where something like OpenVSwitch comes in. That is a topic for another time, and is not the use case I need to support right now. Last week I mentioned three things we need in the area of virtual networking: 1) externally visible IPs for guests (maybe openvswitch) 2) any form of concurrent networking for the guest and host 3) a way to communicate between the host and guest. In the Linux world, you can do qemu's user-mode networking or the tun/tap for 2 and 3. Same with Akaros. My thought was that bypass is relatively simple, much like user-mode networking. A tun/tap style approach seems more complicated and might require software that we don't have yet. I also have a legitimate need to have the guest think it has the real IP address of the node. That's what Fergus et al are doing with the virtio-block pass-through. > Right now, doesn't the host-end of the virtio network works with a bridge > device that talks to /net? Sounds like yes. No. There is no bridge device either. Check out virtio-net's code for details. Basically, it just acts like snoopy and wiretaps the ethernet connection. That's the -n0 flag you pass to vmrunkernel: which NIC to tap. It gets all traffic, and it blasts traffic back out. There's no multiplexing of protocols or anything. That's why I have my "Nasty Devether Hack" patch on the tip of my vmm branch (which is not merged anywhere, but which is where I tell people to run if they want to test things). > > You don't, just like qemu's user-mode networking. Try pinging a guest > > on Qemu that isn't set up with tun/tap. > > This doesn't feel like a viable long-term solution. I think usermode > networking is a bit of a hack because most systems don't support TAP/TUN. It's pretty good - we've been using it for a while with Akaros development. One machine I have uses TUN/TAP with bridges, the other usermode networking. Both are fine. I'd be fine with either approach on Akaros, but I prefer one that is simple and can actually work. > Consider: would the QEMU authors have done user-mode if TUN/TAP had been > universally available? > Sure, it's a speculative question but I think relevant. Maybe. usermode qemu comes with all the 10.0.2.2 and 10.0.2.15 stuff built in. For tun/tap, you probably have to do more. At least I did. brctl, tunctl, ifconfig br0, run a dhcp/dns server, etc. but it's been years since i set it up. Actually, you can see a lot of the stuff I do. Check out scripts/kvm-up.sh and the instructions in GETTING_STARTED. > I think we should ask Jim/Charles if there's something out there that's > available. That'd be great if it existed. > This does keep coming up, but honestly I haven't seen anyone > hurting for it. I've been bringing this up for months, off-list in discussions, and as recently as last week. Fergus uses shitty, lossy serial connections in lieu of ssh on some machines. I have a devether hack that filters TCP port 23 and sends it to the guest. It's getting ridiculous. > The thing is, this crosses many layers of the stack, but in separate > places. The idea of a PNAT is that one does that (necessarily, as you point > out) but it's done in one place. That is, Linux (or whatever) and Akaros > don't both thing they are IP address w.x.y.z; there's one intelligent agent > that handles the relevant mappings. It's actually not that bad. Layer 2 is hardly involved at all. It's emulated in the VMM. Layer 3 (IP) is done by Akaros. Layer 4 (TCP/UDP) by the VM itself. Note that the VM is always doing Layer 4 stuff, and that the bypass solution does almost nothing in Layer 4 - it's about bypassing layer 4. Linux and Akaros are *not* fighting for the same IP address / protocol / port / anything. The bypass is a tool to implement a NAT, one that doesn't even get involved in Layer 4. Linux *thinks* it has a particular IP (10.0.2.15 in qemu style, or HOST_IP in real-addr style). But the only code that ever sees that IP address is the NAT module in virtio-net, which knows to rewrite it (if necessary). > Except that Linux thinks it owns the entire stack; that's perhaps the best > summation of my concern in that area. It thinks it does, but it doesn't. It owns its own stack, and it only listens on the ports that the VMM set up (e.g. 23). > The tie-in with usermode TCP is interesting. I wonder, however if we can do > that now by having a user-mode program implement it's own tcp/* filesystem > and bind it over /net/tcp. That would presumably open /ether/$I/data and > just read/write raw IP datagrams. I don't know what the kernel would do in > that case without a `bypass` but it may be an interesting experiment. You'd need an IP address. If you used the host IP address, then you'd conflict with the native stack. Both stacks would get resets when they hear responses to traffic from the *other* stack. That's what we have now, minus the fake 9p server at /net/tcp. That's why I have the "Nasty Devether Hack" patch. The namespace stuff would just trick a process to use its TCP stack, which virtio-net does for the guest. (Both the namespace and virtio-net are layers of interposition). Barret -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
