Logs attached.

Jim

On 10/14/2016 03:44 PM, Alfredo Cardigliano wrote:
> Uhm, hard to say, could you provide also dmesg? 
> 
> Alfredo
> 
>> On 14 Oct 2016, at 18:07, Jim Hranicky <j...@ufl.edu> wrote:
>>
>> And one more, sorry. I tried to stop zbalance_ipc to move to
>> 32 queues and am getting this error:
>>
>>  Message from syslogd@host at Oct 14 12:05:23 ...
>>   kernel:BUG: soft lockup - CPU#17 stuck for 22s! [migration/17:237]
>>
>>  Message from syslogd@host at Oct 14 12:05:23 ...
>>   kernel:BUG: soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]
>>
>>  Message from syslogd@host at Oct 14 12:05:26 ...
>>   kernel:BUG: soft lockup - CPU#1 stuck for 23s! [migration/1:157]
>>
>>  Message from syslogd@host at Oct 14 12:05:27 ...
>>   kernel:BUG: soft lockup - CPU#13 stuck for 23s! [migration/13:217]
>>
>> kill -9 has no effect. Is this a result of useing too many queues?
>>
>> Jim
>>
>> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
>>> Hi Jim
>>> please note that when using distribution to multiple applications (using a 
>>> comma-separated list in -n), 
>>> the fan-out API is used which supports up to 32 egress queues total, in 
>>> your case you are using 73 queues,
>>> thus I guess only the first 32 instances are receiving traffic (and maybe 
>>> duplicated traffic due to a wrong 
>>> egress mask) . I will add a check for this in zbalance_ipc to avoid this 
>>> kind of misconfigurations.
>>>
>>> Alfredo
>>>
>>>> On 13 Oct 2016, at 22:35, Jim Hranicky <j...@ufl.edu> wrote:
>>>>
>>>> I'm testing out a new server (36 cores, 72 with HT) using
>>>> zbalance_ipc, and it seems occasionally some packets are
>>>> getting sent to multiple processes. 
>>>>
>>>> I'm currently running zbalance_ipc like so: 
>>>>
>>>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>>>
>>>> with 72 snorts like so: 
>>>>
>>>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>>>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>>>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>>>
>>>> I've got a custom HTTP rule to catch GETs with a particular 
>>>> user-agent. I run 100 GETs, and each GET request has the run
>>>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc) 
>>>> and this is what I end up getting when I check the GETs : 
>>>>
>>>>     1 GET /11
>>>>     1 GET /2
>>>>     1 GET /30
>>>>     1 GET /34
>>>>     1 GET /37
>>>>     1 GET /5
>>>>     1 GET /59
>>>>     1 GET /62
>>>>     1 GET /70
>>>>     1 GET /8
>>>>     1 GET /83
>>>>     1 GET /84
>>>>     1 GET /9
>>>>     1 GET /90
>>>>     1 GET /94
>>>>     1 GET /95
>>>>    16 GET /97
>>>>    20 GET /12
>>>>    20 GET /38
>>>>
>>>> Obviously I'm still running into packet loss, but several of the
>>>> GETs are getting sent to multiple processes: 
>>>>
>>>>   ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>   ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>
>>>> Is this an issue with the zbalance_ipc hash? I tried using
>>>>
>>>> -m 1
>>>>
>>>> but it seemed like I ended up dropping even more packets. 
>>>>
>>>> Any advice/pointers appreciated. 
>>>>
>>>> --
>>>> Jim Hranicky
>>>> Data Security Specialist
>>>> UF Information Technology
>>>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>>>> 352-273-1341
>>>> _______________________________________________
>>>> Ntop-misc mailing list
>>>> Ntop-misc@listgateway.unipi.it
>>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>>
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> Ntop-misc@listgateway.unipi.it
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> 
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> 
Oct 14 12:01:23 ewansens1 kernel: BUG: soft lockup - CPU#34 stuck for 22s! 
[zbalance_ipc:6496]
Oct 14 12:01:23 ewansens1 kernel: Modules linked in: ixgbe(OE) pf_ring(OE) dca 
nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill ipt_MASQUERADE 
nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack 
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security 
ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security 
iptable_raw iptable_filter vfat fat ext4 mbcache jbd2 intel_powerclamp coretemp 
intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw 
gf128mul glue_helper ablk_helper sb_edac cryptd iTCO_wdt iTCO_vendor_support 
ipmi_devintf cdc_ether pcspkr mxm_wmi i2c_i801 usbnet lpc_ich mii ipmi_ssif
Oct 14 12:01:23 ewansens1 kernel: mfd_core sg edac_core shpchp mei_me ipmi_si 
acpi_pad wmi mei ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c 
sd_mod crc_t10dif crct10dif_generic sr_mod cdrom mgag200 syscopyarea 
sysfillrect sysimgblt i2c_algo_bit drm_kms_helper crct10dif_pclmul 
crct10dif_common crc32c_intel ttm ahci drm libahci tg3 libata i2c_core ptp 
pps_core megaraid_sas dm_mirror dm_region_hash dm_log dm_mod vxlan 
ip6_udp_tunnel udp_tunnel [last unloaded: pf_ring]
Oct 14 12:01:23 ewansens1 kernel: CPU: 34 PID: 6496 Comm: zbalance_ipc Tainted: 
G           OE  ------------   3.10.0-327.36.1.el7.x86_64 #1
Oct 14 12:01:23 ewansens1 kernel: Hardware name: LENOVO System x3650 M5: 
-[5462AC1]-/01GR451, BIOS -[TCE122WUS-2.01]- 04/27/2016
Oct 14 12:01:23 ewansens1 kernel: task: ffff881018b95c00 ti: ffff881010470000 
task.ti: ffff881010470000
Oct 14 12:01:23 ewansens1 kernel: RIP: 0010:[<ffffffff81301a2c>]  
[<ffffffff81301a2c>] __write_lock_failed+0xc/0x20
Oct 14 12:01:23 ewansens1 kernel: RSP: 0018:ffff881010473da0  EFLAGS: 00000216
Oct 14 12:01:23 ewansens1 kernel: RAX: 0000000000000001 RBX: 00007f4f081c6000 
RCX: 0000000000000000
Oct 14 12:01:23 ewansens1 kernel: RDX: 0000000000000000 RSI: 00007ffe7625f308 
RDI: ffff88202476c8c4
Oct 14 12:01:23 ewansens1 kernel: RBP: ffff881010473da0 R08: 0000000000000018 
R09: 0000000000000000
Oct 14 12:01:23 ewansens1 kernel: R10: 00000000000008b4 R11: 0000000000000000 
R12: ffffea0000000000
Oct 14 12:01:23 ewansens1 kernel: R13: ffff881018b95c00 R14: ffff881010473d70 
R15: 00007f4f081c5fff
Oct 14 12:01:23 ewansens1 kernel: FS:  00007f4f097d4740(0000) 
GS:ffff88203f000000(0000) knlGS:0000000000000000
Oct 14 12:01:23 ewansens1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Oct 14 12:01:23 ewansens1 kernel: CR2: 00000000082d8c40 CR3: 000000201457b000 
CR4: 00000000001407e0
Oct 14 12:01:23 ewansens1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 
DR2: 0000000000000000
Oct 14 12:01:23 ewansens1 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 
DR7: 0000000000000400
Oct 14 12:01:23 ewansens1 kernel: Stack:
Oct 14 12:01:23 ewansens1 kernel: ffff881010473db0 ffffffff8163d9f7 
ffff881010473dd8 ffffffffa078c21e
Oct 14 12:01:23 ewansens1 kernel: 00007ffe7625f2f0 ffff882024360f00 
ffff881010473e60 ffff881010473f30
Oct 14 12:01:23 ewansens1 kernel: ffffffffa079afe1 ffffffff81632be5 
ffff881010473e10 0000000181193525
Oct 14 12:01:23 ewansens1 kernel: Call Trace:
Oct 14 12:01:23 ewansens1 kernel: [<ffffffff8163d9f7>] _raw_write_lock+0x17/0x20
Oct 14 12:01:23 ewansens1 kernel: [<ffffffffa078c21e>] 
pfring_release_zc_dev+0x3e/0x1d0 [pf_ring]
Oct 14 12:01:23 ewansens1 kernel: [<ffffffffa079afe1>] 
ring_setsockopt+0x1861/0x2870 [pf_ring]
Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81632be5>] ? __slab_free+0x10e/0x277
Oct 14 12:01:23 ewansens1 kernel: [<ffffffff8119b7a2>] ? unmap_region+0xe2/0x130
Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81288a75>] ? sock_has_perm+0x75/0x90
Oct 14 12:01:23 ewansens1 kernel: [<ffffffff811c0d02>] ? 
kmem_cache_free+0xd2/0x1e0
Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81289d70>] ? 
selinux_socket_setsockopt+0x40/0x50
Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81512370>] SyS_setsockopt+0x80/0xf0
Oct 14 12:01:23 ewansens1 kernel: [<ffffffff81646a09>] 
system_call_fastpath+0x16/0x1b
Oct 14 12:01:23 ewansens1 kernel: Code: 89 01 31 c0 66 66 90 c3 b8 f2 ff ff ff 
66 66 90 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 f0 ff 07 f3 
90 83 3f 01 <75> f9 f0 ff 0f 75 f1 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Reply via email to