Hi Jonathan, On Jan 14, 2009, at 9:54 AM, Jonathan Wheeler wrote:
> Hi Folks, > > I've been contacted offlist with a request for further updates and > information, and tonight I discovered something really weird well > worth sharing. Apologies for another long post! > > First an update on the changes to my test environment: > I created a new vnic on e1000g1 called dnsvnic0, and created/cloned > my sparse-template zone into a new sparseroot zone named dns, which > uses dnsvnic0, with the IP address 192.168.1.62. > The zone booted and I was straight into this problem again. I hadn't > been able to get my sparse-template zone to fault again, but > immediately after creating a new vnic/zone, I was back to having > this elusive yet frustrating issue. > > Just as a refresher, my Solaris server here is VM running under > VMware ESXi 3.5u3 (with all current patches). An extra layer of > virtualisation does add extra questions, so I tried a ping test that > would be entirely internal to the ESX host.; pinging the global zone > from the non-global [dns] zone. > > Traffic test #1 >> From within the dns zone: > bash-3.2# ping 192.168.1.60 > no answer from 192.168.1.60 So what is 192.168.1.60? I guess it's the global zone, but e1000g0 or e1000g1? If it's e1000g0 but dnsvnic0 is created on e1000g1 there will be no virtual switching between these data-links. > > bash-3.2# arp -an > Net to Media Table: IPv4 > Device IP Address Mask Flags Phys Addr > ------ -------------------- --------------- -------- --------------- > dnsvnic0 192.168.1.61 255.255.255.255 o 02:08:20:be: > 66:8e > dnsvnic0 192.168.1.60 255.255.255.255 00:0c: > 29:60:4e:c2 > dnsvnic0 192.168.1.62 255.255.255.255 SPLA 02:08:20:ff: > 77:4f > dnsvnic0 192.168.1.133 255.255.255.255 o 00:15:f2:1d: > 48:c2 > dnsvnic0 224.0.0.0 240.0.0.0 SM 01:00:5e: > 00:00:00 > Arp packets *are* returning. ICMP however are *not*. > > snoop from the global zone on the e1000g1 interface (which the vnic > is running on): > # snoop -d e1000g1 arp or icmp > Using device e1000g1 (promiscuous mode) > 192.168.1.62 -> (broadcast) ARP C Who is 192.168.1.60, persephone ? > persephone -> 192.168.1.62 ARP R 192.168.1.60, persephone is 0:c: > 29:60:4e:c2 > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 0) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 1) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 2) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 3) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23182 Sequence > number: 4) > (and so on...) > > # snoop -d e1000g0 arp or icmp (which only the global zone is using) > Using device e1000g0 (promiscuous mode) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 0) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 0) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 1) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 1) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 2) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 2) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 3) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 3) > 192.168.1.62 -> persephone ICMP Echo request (ID: 23212 Sequence > number: 4) > persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence > number: 4) > > So the global zone is replying to the non-global zone, 'dns' just > isn't seeing the replies. > This is sounding a lot like a weird vswitch bug. No necessarily. It depends on how you wired your NICs. If e1000g0 and e1000g1 are connected to the same switch, then the packet can go from dnsvnic0->e1000g1->switch->e1000g0->global zone. You may not see the reply come back to dnsvnic0 via global_zone->e1000g0->switch->e1000g1 due to the same problem you described initially with unicast packets not making it to the VNIC in the VMware VM. > > > Next I decided to try zone-to-zone traffic.: > Server - vnic - IP > Zone-template - zonevnic0 (via e1000g1) - 192.168.1.61 > DNS - dnsvnic0 (via e1000g1) - 192.168.1.62 > > This worked... DNS could ping Zone-template. Because in this case you are going through the virtual switch. > > What's really surprised my was that that my snoop on e1000g1 was > showing the traffic. It was my understanding that vnic-to-vnic > traffic that's attached to the same pnic never actually went across > the wire, so why is snoop on a physical interface showing vnic <> > vnic traffic ? That's done by design to allow the global zone/dom0 see all traffic exchanged between the VMs/Zones. It's similar to a monitoring port on a physical switch. > > > A) Something in crossbow isn't working properly. > B) I'm misunderstanding how vnics talk to each other. I understand > etherstubs, but it just makes sense that inter-zone traffic > shouldn't be sending traffic down a bottleneck like a pNIC when it's > all *internal* anyway. > C) The traffic isn't actually going out the physical interface > across the wire, but it is going via the logical concept of the > e1000g1 interface, which snoop is reporting on - which is rather > confusing to an end user like me trying to diagnose this using > snoop :( > > Can anyone clarify this one for me? > > The WTF moment of the night was this: > vSwitches security in ESX is configured like this by default: > Promiscuous Mode: Disabled > MAC Address Changes: Accept > Forged Transmits: Accept > > These sound like reasonable defaults to me, toggling the Promiscuous > flag to my understanding would pretty much turn the vSwitch into a > "vHub"! > > I left a [non-returning] ping running between dns and the global > zone, and decided to try enabling Promiscuous mode anyway. > No change. > > I started a snoop up on e1000g1, and suddenly the sparse-template <> > dns ping that I started in another terminal moments ago started > working. I disabled the snoop, and it stopped working again. > > !!!? > > Enabling the promiscuous flag on the e1000g1 driver is suddenly > "fixing" my traffic problem. > > My best interpretation of this data is that 1 of 3 things isn't > working, and I'm starting to get out of my depth here fast. > > A) Crossbow itself is doing something 'funny' with the way traffic > is being passed on to the vswitch, which is causing it to not send > traffic for this mac address down the correct virtual port on the > switch. Arp spoofing is common enough and both of those options are > already enabled so it's something else which is causing it to get > confused it would seem. Sadly there isn't any interface to the > vSwitch that I'm aware of to pull some stats/logs from. > Funny promiscous ARPs? sending traffic down both pnics? something > else to confuse the vswitch? I'm out of skills to troubleshoot this > option any further. > > B) The vSwitch in ESXi has a bug. If so, why is it only effecting > crossbow... ESX is very widely used so if there was a glaring bug in > the vSwitch ethernet implementation it would be very common and > public knowledge. Crossbow is new enough; is it possible that I'm > the first to have tried this configuration under ESX and thus am the > first to notice this issue? > There aren't any other options within ESX that I'm aware of that I > can try to get some further data on the vSwitch itself, so I'm at a > loss as to how I troubleshoot this one further. > I'm also just using the free ESXi, so I can't contact VMware for > support on this and at this point it would be a pretty vauge bug > report anyway :/ > > C) The intel pro 1000 vNIC that ESX is exposing to the VM has a bug > in it, or the solaris e1000g driver has a bug when sending crossbow > traffic across it (or a combination of the two). > The intel pro 1000 is a very common server NIC, and I'd be > gobsmacked if there was a bug with a real (non-virtual) e1000g > adapter that the Sun folk hadn't picked up in their prerelease > testing. > > The only option for vNICs within ESX, for a 64-bit solaris host, is > the e1000 NIC. I trying to setup a 32-bit host to see what NIC that > ends up with. If this provides different result, that at least gives > us some better information on where to start looking! > > Any further directions or feedback would be most welcome. If I'm > heading in the wrong direction, please do tell me :) I have a theory. When you create a VNIC, Crossbow will try to associate the unicast MAC address with the NIC. Most NICs have hardware unicast filters which allow traffic for multiple unicast addresses to be received without turning the NIC in promiscuous mode. e1000g provides multiple such slots for unicast addresses. What could be happening is that e1000g running in the VM happily allows Crossbow to program the unicast address for the VNIC address, but the VMware back-end driver or virtual switch doesn't know about that address. So all broadcast and multicast packets are going in and out as expected, all traffic from the VNIC are going out without a problem, but when unicast packets are coming back for the unicast address of the VNIC, they never make it to the VM. If you simply enable promiscuous mode on the VMware virtual switch, then it will take these packets, but the back-end driver instance associated with e1000g might still filter out these packets by default and dropping them. In order to see the packets you have to turn on promiscuous mode on e1000g1 itself which probably causes the VMWare back-end to send all packets up. If this theory is correct, what would help is allow the VMware back- end to send up all packets received from the VMware virtual switch without filtering. But I don't know if VMware provides that option. Nicolas. > > > Jonathan > -- > This message posted from opensolaris.org > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss -- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux