On 06/29/2012 02:25 PM, Sašo Kiselkov wrote: > On 06/29/2012 12:24 AM, Dan McDonald wrote: >> I thought 918 wouldn't be the issue here as there is no VLAN involved. >> >> ANYWAY, I do have a just revised today 918 and its webrev (because one of >> our Nexenta customers was bitten by 918) here: >> >> http://kebe.com/~danmcd/webrevs/918/ >> >> You should be able to drop these diffs into your illumos-gate clone, and >> "build small" just the mac module itself. I blogged about this: >> >> >> http://kebesays.blogspot.com/2011/03/for-illumos-newbies-on-developing-small.html >> >> I'm not sure if 918 will cure what ails you, but you should try my fix and >> see if it does. > > Thanks for the many pointers. I tried and built a new mac module. > Installed it and here are my observations: > > - without the patch for 918 (just plain illumos-gate from mercurial), > as expected, I'm seeing the issue as before > > - with the patch, something indeed happened, but not very good. Now I'm > not getting any multicast packets to userspace, but I can see them > arriving on the interface via snoop. > > I tried disabling the dladm set-linkprop cpus=0-31 tuning and that > restored functionality, but since by default fanout is only to a single > CPU, I'm still hitting the old single-core fanout bottleneck. If I set > cpus to anything that causes the fanout cpu list to include more than > one CPU, I can't get any multicast packet to propagate to userspace. > > Example: > > ** Multicast OK ** > dladm set-linkprop -p cpus=0-2 igb3 > # echo '::mac_srs -rcv' | mdb -k > CPU_COUNT FANOUT_CPU_COUNT > ADDR LINK_NAME (CPU_LIST) (CPU_LIST) > ffffff2203800340 igb3 3 1 > (00,01,02) (01) > ffffff22037ff680 igb3 3 1 > (00,01,02) (01) > > > ** Multicast BROKEN ** > dladm set-linkprop -p cpus=0-2 igb3 > # echo '::mac_srs -rcv' | mdb -k > CPU_COUNT FANOUT_CPU_COUNT > ADDR LINK_NAME (CPU_LIST) (CPU_LIST) > ffffff2203800340 igb3 4 2 > (00,01,02,03) (01,02) > ffffff22037ff680 igb3 4 2 > (00,01,02,03) (01,02) > > Any ideas what might be causing this? I've commented out all tunings in > /etc/system to make sure I'm not hitting some weird behavior...
Dtracing I find that the difference is this: * with the cpus linkprop unset or set to anything that causes fanout to only a single CPU, I see the following call flow: CPU FUNCTION 0 -> mac_rx_srs_drain 0 -> mac_rx_srs_proto_fanout 0 -> mac_rx_soft_ring_process 0 -> mac_rx_deliver 0 -> mac_vlan_header_info 0 -> mac_header_info 0 <- mac_header_info 0 <- mac_vlan_header_info 0 <- mac_rx_deliver 0 <- mac_rx_soft_ring_process 0 <- mac_rx_srs_proto_fanout 0 <- mac_rx_srs_drain Here mac_rx_srs_long_fanout is never called and instead processing goes directly to deliver the packets. * with cpus set to anything that causes fanout to multiple CPUs I see that processing switches from calling mac_rx_srs_proto_fanout to use mac_rx_srs_fanout instead, which then calls mac_rx_srs_long_fanout. So now the call trace looks like this: 0 -> mac_rx_srs_drain 0 -> mac_rx_srs_fanout 0 -> mac_rx_srs_long_fanout 0 <- mac_rx_srs_long_fanout 0 -> mac_rx_soft_ring_process 0 <- mac_rx_soft_ring_process 0 <- mac_rx_srs_fanout 0 <- mac_rx_srs_drain So I can see that mac_rx_deliver is not being called, so apparently the packets are never delivered to the IP layer. Any ideas on why this might be? (I've been dtracing only over the mac module functions, I'll try expanding my search to see what mac_rx_srs_fanout is doing.) Cheers, -- Saso ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
