I managed to get to the bottom of why Suricata wasn't working with hugepages in PF_RING libzero or ZC. It seems it was because I was running Suricata as a non-root user which meant it couldn't mmap the hugepages (presumably it drops privileges too soon). Running as root solves the problem (though isn't ideal!).
I'll have a look at the Suricata code to see if it's something very easy to fix or raise an issue with the developers. ARGUS is also running as a non-root user, but I guess it didn't suffer from this as it's using libpcap and/or drops privileges later. Best Wishes, Chris On 29/10/14 20:35, Chris Wakelin wrote: > Hi Alfredo, > > Did you manage to test Suricata with libzero+hugepages or ZC? > > I've just had another go after a clean reboot (now on fully-patched > Ubuntu 12.04.5 64-bit, kernel 3.2.0-70, PF_RING 6.0.2), followed by > reserving 1024 2048-KB pages :- > > insmod ixgbe.ko RSS=1,1 mtu=1522 adapters_to_enable=xx:xx:xx:xx:xx:xx > num_rx_slots=32768 num_tx_slots=0 numa_cpu_affinity=1,1 > ifconfig up dna0 > > echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > cat /proc/meminfo | grep Huge > mount -t hugetlbfs none /mnt/huge > > pfdnacluster_master -i dna0 -c 1 -n 15,1 -r 15 -m 4 -u /mnt/huge -d > > I connected ARGUS to dnacl:1@15 and it worked fine. > > However any attempt to start Suricata fails with things like: >> [14115] 29/10/2014 -- 19:59:25 - (runmode-pfring.c:287) <Info> >> (ParsePfringConfig) -- DNA interface detected, not setting cluster-id for >> PF_RING (iface dnacl:1@0) >> [14115] 29/10/2014 -- 19:59:25 - (runmode-pfring.c:335) <Info> >> (ParsePfringConfig) -- DNA interface detected, not setting cluster type for >> PF_RING (iface dnacl:1@0) >> [14115] 29/10/2014 -- 19:59:25 - (util-runmodes.c:559) <Info> >> (RunModeSetLiveCaptureWorkersForDevice) -- Going to use 1 thread(s) >> [14116] 29/10/2014 -- 19:59:25 - (util-affinity.c:320) <Info> >> (AffinityGetNextCPU) -- Setting affinity on CPU 0 >> [14116] 29/10/2014 -- 19:59:25 - (tm-threads.c:1439) <Info> >> (TmThreadSetupOptions) -- Setting prio -2 for "RxPFdnacl:1@01" Module to >> cpu/core 0, thread id 14116 >> [14116] 29/10/2014 -- 19:59:25 - (tm-threads.c:1350) <Error> >> (TmThreadSetPrio) -- [ERRCODE: SC_ERR_THREAD_NICE_PRIO(47)] - Error setting >> nice value for thread RxPFdnacl:1@01: Operation not permitted >> [14116] 29/10/2014 -- 19:59:25 - (tmqh-packetpool.c:291) <Info> >> (PacketPoolInit) -- preallocated 512 packets. Total memory 1790976 >> [14116] 29/10/2014 -- 19:59:25 - (source-pfring.c:446) <Error> >> (ReceivePfringThreadInit) -- [ERRCODE: SC_ERR_PF_RING_OPEN(34)] - Failed to >> open dnacl:1@0: pfring_open error. Check if dnacl:1@0 exists and pf_ring >> module is loaded. >> [14115] 29/10/2014 -- 19:59:25 - (runmode-pfring.c:287) <Info> >> (ParsePfringConfig) -- DNA interface detected, not setting cluster-id for >> PF_RING (iface dnacl:1@1) >> [14115] 29/10/2014 -- 19:59:25 - (runmode-pfring.c:335) <Info> >> (ParsePfringConfig) -- DNA interface detected, not setting cluster type for >> PF_RING (iface dnacl:1@1) >> [14115] 29/10/2014 -- 19:59:25 - (util-runmodes.c:559) <Info> >> (RunModeSetLiveCaptureWorkersForDevice) -- Going to use 1 thread(s) >> [14117] 29/10/2014 -- 19:59:25 - (util-affinity.c:320) <Info> >> (AffinityGetNextCPU) -- Setting affinity on CPU 1 >> [14117] 29/10/2014 -- 19:59:25 - (tm-threads.c:1439) <Info> >> (TmThreadSetupOptions) -- Setting prio -2 for "RxPFdnacl:1@11" Module to >> cpu/core 1, thread id 14117 >> [14117] 29/10/2014 -- 19:59:25 - (tm-threads.c:1350) <Error> >> (TmThreadSetPrio) -- [ERRCODE: SC_ERR_THREAD_NICE_PRIO(47)] - Error setting >> nice value for thread RxPFdnacl:1@11: Operation not permitted >> [14117] 29/10/2014 -- 19:59:25 - (tmqh-packetpool.c:291) <Info> >> (PacketPoolInit) -- preallocated 512 packets. Total memory 1790976 >> [14117] 29/10/2014 -- 19:59:25 - (source-pfring.c:446) <Error> >> (ReceivePfringThreadInit) -- [ERRCODE: SC_ERR_PF_RING_OPEN(34)] - Failed to >> open dnacl:1@1: pfring_open error. Check if dnacl:1@1 exists and pf_ring >> module is loaded. > > My Suricata config looks like (I know the cluster settings are ignored):- > > pfring: > - interface: dnacl:1@0 > threads: 1 > cluster-id: 99 > cluster-type: cluster_flow > - interface: dnacl:1@1 > threads: 1 > cluster-id: 99 > cluster-type: cluster_flow > > ... > > - interface: dnacl:1@14 > threads: 1 > cluster-id: 99 > cluster-type: cluster_flow > > If I start pfdnacluster_master without "-u /mnt/huge", then Suricata > works fine (well, it drops some packets; when it's doing that, the CPU > cores are usually not anywhere near being maxed out, which is why I want > to get this to work :-) ) > > Everything I could think of trying with pfcount or pfdump works fine > with the huge pages, and as far as I can see pfring_open() is called in > a similar way to that in Suricata. > > e.g.: > pfcount -i dnacl:1@14 -m -l 1522 -g 14 > > Relevant bit of Suricata (git master of two days ago) src/source-pfring.c : > >> opflag = PF_RING_REENTRANT | PF_RING_PROMISC; >> >> /* if suri uses VLAN and if we have a recent kernel, we need >> * to use parsed_pkt to get VLAN info */ >> if ((! ptv->vlan_disabled) && SCKernelVersionIsAtLeast(3, 0)) { >> opflag |= PF_RING_LONG_HEADER; >> } >> >> if (ptv->checksum_mode == CHECKSUM_VALIDATION_RXONLY) { >> if (strncmp(ptv->interface, "dna", 3) == 0) { >> SCLogWarning(SC_ERR_INVALID_VALUE, >> "Can't use rxonly checksum-checks on DNA interface," >> " resetting to auto"); >> ptv->checksum_mode = CHECKSUM_VALIDATION_AUTO; >> } else { >> opflag |= PF_RING_LONG_HEADER; >> } >> } >> >> ptv->pd = pfring_open(ptv->interface, (uint32_t)default_packet_size, >> opflag); >> if (ptv->pd == NULL) { >> SCLogError(SC_ERR_PF_RING_OPEN,"Failed to open %s: pfring_open >> error." >> " Check if %s exists and pf_ring module is loaded.", >> ptv->interface, >> ptv->interface); >> pfconf->DerefFunc(pfconf); >> return TM_ECODE_FAILED; >> } else { > > I have checksums disabled and VLANs enabled at the moment (though had > the same problem with VLANs disabled). Default packet size is 1522 (we > have VLANs). > > P.S. I tried running pfdnacluster_master with just "-n 7,1" and Suricata > using just the cores on that NUMA node, and it seems I do need more > cores than that! > > P.P.S. Another question I forgot to ask - do you recommend disabling > hyperthreading (I have)? > > Best Wishes, > Chris > > On 22/10/14 23:48, Alfredo Cardigliano wrote: >> Hi Chris >> please read below >> >>> On 22 Oct 2014, at 21:43, Chris Wakelin <c.d.wake...@reading.ac.uk> wrote: >>> >>> Hi, >>> >>> Our Suricata instance running on PF_RING with libzero has been dropping >>> packets recently (at ~2Gb/s load), but the CPU cores are not maxed out >>> in general. So I've been looking again at more recent PF_RING options :-) >>> >>> The setup is a Dell R620 with 64GB RAM (OK I should add more), two CPUS >>> with 8 cores on each (hyperthreading turned off), and a ixgbe Intel 10Gb >>> dual-port card of which I'm using just one port. I'm using PF_RING 6.0.2 >>> at the moment. >>> >>> I must admit I'm a bit confused! >>> >>> I load the DNA ixgbe with >>> >>> insmod ixgbe.ko RSS=1,1 mtu=1522 adapters_to_enable=xx:xx:xx:xx:xx:xx >>> (the port I'm using) >>> then >>> >>> pfdnacluster_master -i dna0 -c 1 -n 15,1 -r 15 -d >>> >>> Suricata then runs (in "workers" runmode) using dnacl:1@0 ... 1@14 and >>> we run ARGUS (using libpcap) on dnacl:1@15 >>> >>> So questions :- >>> >>> 1) How does CPU affinity work in libzero (or ZC)? There's no IRQs to fix ... >>> Does it bind dnacl:1@0 to core 0, dnacl:1@1 to core 1 etc.? >> >> IRQs are not used, you can set core affinity for ring memory allocation >> using numa_cpu_affinity >> >> insmod ixgbe.ko RSS=1,1 mtu=1522 num_rx_slots=32768 >> adapters_to_enable=xx:xx:xx:xx:xx:xx numa_cpu_affinity=0,0 >> >>> What should >>> the RX thread (pfdnacluster_master -r) be bound to? >> >> You should bind the master on one of the cores of the CPU where the NIC is >> connected (same core as numa_cpu_affinity). >> >>> 2) After reading >>> http://www.ntop.org/pf_ring/not-all-servers-are-alike-with-pf_ring-zcdna-part-3/ >>> I'm wondering whether I would be better running just 8 queues (or 7 and >>> 1 for ARGUS) and forcing them somehow to the NUMA node the ixgbe card is >>> attached to? >> >> This is recommended if 8 cores are enough for packet processing, otherwise >> it might be worth crossing the QPI bus. You should run some test. >> >>> (If yes, how do I bind libzero to cores 0,2,4,6,8,10,12,14 or whatever >>> numactl says is on the same node as the NIC?) >> >> -r for the master, check suricata and argus for affinity options. >> >>> 3) Hugepages work in that I can allocate 1024 2048K ones as suggested in >>> README.hugepages and then run pfdnacluster_master with the "-u >>> /mnt/huge" option, and then pfcount, tcpdump etc. work. However Suricata >>> always crashes out. >> >> I will run some test asap. >> >>> Similarly if I start pfdnacluster_master without huge pages, then >>> Suricata, then stop and restart pfdnacluster_master with huge pages, >>> while Suricata is still running the latter fails (but is fine restarting >>> without huge pages). >> >> Expected, you should not change the configuration while running. >> >>> If I start ZC version of ixgbe (which needs huge pages of course) and use >>> >>> zbalance_ipc -i zc:eth4 -c 1 -n 15,1 -m 1 >>> (with Suricata talking to zc:1@0 .. zc:@14) then Suricata also fails in >>> a similar way (errors like "[ERRCODE: SC_ERR_PF_RING_OPEN(34)] - Failed >>> to open zc:1@0: pfring_open error. Check if zc:1@0 exists"), though >>> pfcount and tcpdump are fine. >> >> I will test also this configuration. >> >>> Is it worth going for 1GB pages (which are available) and how many would >>> I need? >> >> 1GB pages should be supported but not tested. >> >>> 4) Is it worth increasing the number of slots in each queue >>> (pfdnacluster_master -q) or num_rx_slots (in loading ixgbe)? >> >> This can help handling spikes. >> >>> (We've replaced our border switches with ones our Network Manager is >>> confident won't crash if somehow PF_RING *sends* packets to the mirrored >>> port - that crashed one of the old switches - so I'm allowed to reload >>> PF_RING + NIC drivers without going through Change Management and >>> "at-risk" periods now :-) ) >> >> :-) >> >>> Best Wishes, >>> Chris >> >> BR >> Alfredo >> >>> >>> -- >>> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- >>> Christopher Wakelin, c.d.wake...@reading.ac.uk >>> IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908 >>> Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094 >>> _______________________________________________ >>> Ntop-misc mailing list >>> Ntop-misc@listgateway.unipi.it >>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc >> >> _______________________________________________ >> Ntop-misc mailing list >> Ntop-misc@listgateway.unipi.it >> http://listgateway.unipi.it/mailman/listinfo/ntop-misc >> > > -- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- Christopher Wakelin, c.d.wake...@reading.ac.uk IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908 Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094 _______________________________________________ Ntop-misc mailing list Ntop-misc@listgateway.unipi.it http://listgateway.unipi.it/mailman/listinfo/ntop-misc