Hi Jonathan,

On Jan 14, 2009, at 9:54 AM, Jonathan Wheeler wrote:

> Hi Folks,
>
> I've been contacted offlist with a request for further updates and  
> information, and tonight I discovered something really weird well  
> worth sharing. Apologies for another long post!
>
> First an update on the changes to my test environment:
> I created a new vnic on e1000g1 called dnsvnic0, and created/cloned  
> my sparse-template zone into a new sparseroot zone named dns, which  
> uses dnsvnic0, with the IP address 192.168.1.62.
> The zone booted and I was straight into this problem again. I hadn't  
> been able to get my sparse-template zone to fault again, but  
> immediately after creating a new vnic/zone, I was back to having  
> this elusive yet frustrating issue.
>
> Just as a refresher, my Solaris server here is VM running under  
> VMware ESXi 3.5u3 (with all current patches). An extra layer of  
> virtualisation does add extra questions, so I tried a ping test that  
> would be entirely internal to the ESX host.; pinging the global zone  
> from the non-global [dns] zone.
>
> Traffic test #1
>> From within the dns zone:
> bash-3.2# ping 192.168.1.60
> no answer from 192.168.1.60

So what is 192.168.1.60? I guess it's the global zone, but e1000g0 or  
e1000g1?

If it's e1000g0 but dnsvnic0 is created on e1000g1 there will be no  
virtual switching between these data-links.

>
> bash-3.2# arp -an
> Net to Media Table: IPv4
> Device   IP Address               Mask      Flags      Phys Addr
> ------ -------------------- --------------- -------- ---------------
> dnsvnic0 192.168.1.61         255.255.255.255 o        02:08:20:be: 
> 66:8e
> dnsvnic0 192.168.1.60         255.255.255.255          00:0c: 
> 29:60:4e:c2
> dnsvnic0 192.168.1.62         255.255.255.255 SPLA     02:08:20:ff: 
> 77:4f
> dnsvnic0 192.168.1.133        255.255.255.255 o        00:15:f2:1d: 
> 48:c2
> dnsvnic0 224.0.0.0            240.0.0.0       SM       01:00:5e: 
> 00:00:00
> Arp packets *are* returning. ICMP however are *not*.
>
> snoop from the global zone on the e1000g1 interface (which the vnic  
> is running on):
> # snoop -d e1000g1 arp or icmp
> Using device e1000g1 (promiscuous mode)
> 192.168.1.62 -> (broadcast)  ARP C Who is 192.168.1.60, persephone ?
>  persephone -> 192.168.1.62 ARP R 192.168.1.60, persephone is 0:c: 
> 29:60:4e:c2
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 0)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 1)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 2)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 3)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23182 Sequence  
> number: 4)
> (and so on...)
>
> # snoop -d e1000g0 arp or icmp (which only the global zone is using)
> Using device e1000g0 (promiscuous mode)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 0)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 0)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 1)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 1)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 2)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 2)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 3)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 3)
> 192.168.1.62 -> persephone   ICMP Echo request (ID: 23212 Sequence  
> number: 4)
>  persephone -> 192.168.1.62 ICMP Echo reply (ID: 23212 Sequence  
> number: 4)
>
> So the global zone is replying to the non-global zone, 'dns' just  
> isn't seeing the replies.
> This is sounding a lot like a weird vswitch bug.

No necessarily. It depends on how you wired your NICs. If e1000g0 and  
e1000g1 are connected to the same switch, then the packet can go from  
dnsvnic0->e1000g1->switch->e1000g0->global zone. You may not see the  
reply come back to dnsvnic0 via global_zone->e1000g0->switch->e1000g1  
due to the same problem you described initially with unicast packets  
not making it to the VNIC in the VMware VM.

>
>
> Next I decided to try zone-to-zone traffic.:
> Server - vnic - IP
> Zone-template - zonevnic0 (via e1000g1) - 192.168.1.61
> DNS - dnsvnic0 (via e1000g1) - 192.168.1.62
>
> This worked... DNS could ping Zone-template.

Because in this case you are going through the virtual switch.

>
> What's really surprised my was that that my snoop on e1000g1 was  
> showing the traffic. It was my understanding that vnic-to-vnic  
> traffic that's attached to the same pnic never actually went across  
> the wire, so why is snoop on a physical interface showing vnic <>  
> vnic traffic ?

That's done by design to allow the global zone/dom0 see all traffic  
exchanged between the VMs/Zones. It's similar to a monitoring port on  
a physical switch.

>
>
> A) Something in crossbow isn't working properly.
> B) I'm misunderstanding how vnics talk to each other. I understand  
> etherstubs, but it just makes sense that inter-zone traffic  
> shouldn't be sending traffic down a bottleneck like a pNIC when it's  
> all *internal* anyway.
> C) The traffic isn't actually going out the physical interface  
> across the wire, but it is going via the logical concept of the  
> e1000g1 interface, which snoop is reporting on - which is rather  
> confusing to an end user like me trying to diagnose this using  
> snoop :(
>
> Can anyone clarify this one for me?
>
> The WTF moment of the night was this:
> vSwitches security in ESX is configured like this by default:
> Promiscuous Mode: Disabled
> MAC Address Changes: Accept
> Forged Transmits: Accept
>
> These sound like reasonable defaults to me, toggling the Promiscuous  
> flag to my understanding would pretty much turn the vSwitch into a  
> "vHub"!
>
> I left a [non-returning] ping running between dns and the global  
> zone, and decided to try enabling Promiscuous mode anyway.
> No change.
>
> I started a snoop up on e1000g1, and suddenly the sparse-template <>  
> dns ping that I started in another terminal moments ago started  
> working. I disabled the snoop, and it stopped working again.
>
> !!!?
>
> Enabling the promiscuous flag on the e1000g1 driver is suddenly  
> "fixing" my traffic problem.
>
> My best interpretation of this data is that 1 of 3 things isn't  
> working, and I'm starting to get out of my depth here fast.
>
> A) Crossbow itself is doing something 'funny' with the way traffic  
> is being passed on to the vswitch, which is causing it to not send  
> traffic for this mac address down the correct virtual port on the  
> switch. Arp spoofing is common enough and both of those options are  
> already enabled so it's something else which is causing it to get  
> confused it would seem. Sadly there isn't any interface to the  
> vSwitch that I'm aware of to pull some stats/logs from.
> Funny promiscous ARPs? sending traffic down both pnics? something  
> else to confuse the vswitch? I'm out of skills to troubleshoot this  
> option any further.
>
> B) The vSwitch in ESXi has a bug. If so, why is it only effecting  
> crossbow... ESX is very widely used so if there was a glaring bug in  
> the vSwitch ethernet implementation it would be very common and  
> public knowledge. Crossbow is new enough; is it possible that I'm  
> the first to have tried this configuration under ESX and thus am the  
> first to notice this issue?
> There aren't any other options within ESX that I'm aware of that I  
> can try to get some further data on the vSwitch itself, so I'm at a  
> loss as to how I troubleshoot this one further.
> I'm also just using the free ESXi, so I can't contact VMware for  
> support on this and at this point it would be a pretty vauge bug  
> report anyway :/
>
> C) The intel pro 1000 vNIC that ESX is exposing to the VM has a bug  
> in it, or the solaris e1000g driver has a bug when sending crossbow  
> traffic across it (or a combination of the two).
> The intel pro 1000 is a very common server NIC, and I'd be  
> gobsmacked if there was a bug with a real (non-virtual) e1000g  
> adapter that the Sun folk hadn't picked up in their prerelease  
> testing.
>
> The only option for vNICs within ESX, for a 64-bit solaris host, is  
> the e1000 NIC. I trying to setup a 32-bit host to see what NIC that  
> ends up with. If this provides different result, that at least gives  
> us some better information on where to start looking!
>
> Any further directions or feedback would be most welcome. If I'm  
> heading in the wrong direction, please do tell me :)

I have a theory.

When you create a VNIC, Crossbow will try to associate the unicast MAC  
address with the NIC. Most NICs have hardware unicast filters which  
allow traffic for multiple unicast addresses to be received without  
turning the NIC in promiscuous mode. e1000g provides multiple such  
slots for unicast addresses.

What could be happening is that e1000g running in the VM happily  
allows Crossbow to program the unicast address for the VNIC address,  
but the VMware back-end driver or virtual switch doesn't know about  
that address. So all broadcast and multicast packets are going in and  
out as expected, all traffic from the VNIC are going out without a  
problem, but when unicast packets are coming back for the unicast  
address of the VNIC, they never make it to the VM.

If you simply enable promiscuous mode on the VMware virtual switch,  
then it will take these packets, but the back-end driver instance  
associated with e1000g might still filter out these packets by default  
and dropping them. In order to see the packets you have to turn on  
promiscuous mode on e1000g1 itself which probably causes the VMWare  
back-end to send all packets up.

If this theory is correct, what would help is allow the VMware back- 
end to send up all packets received from the VMware virtual switch  
without filtering. But I don't know if VMware provides that option.

Nicolas.

>
>
> Jonathan
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux


Reply via email to