This seems to be driver-related. You were using ixgbe in this test right? Did 
you do something like putting the interface down or unloading the driver 
perhaps? Trying to figure out what caused this..

Thank you
Alfredo

> On 14 Oct 2016, at 22:25, Jim Hranicky <j...@ufl.edu> wrote:
> 
> Logs attached.
> 
> Jim
> 
> On 10/14/2016 03:44 PM, Alfredo Cardigliano wrote:
>> Uhm, hard to say, could you provide also dmesg? 
>> 
>> Alfredo
>> 
>>> On 14 Oct 2016, at 18:07, Jim Hranicky <j...@ufl.edu> wrote:
>>> 
>>> And one more, sorry. I tried to stop zbalance_ipc to move to
>>> 32 queues and am getting this error:
>>> 
>>> Message from syslogd@host at Oct 14 12:05:23 ...
>>>  kernel:BUG: soft lockup - CPU#17 stuck for 22s! [migration/17:237]
>>> 
>>> Message from syslogd@host at Oct 14 12:05:23 ...
>>>  kernel:BUG: soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]
>>> 
>>> Message from syslogd@host at Oct 14 12:05:26 ...
>>>  kernel:BUG: soft lockup - CPU#1 stuck for 23s! [migration/1:157]
>>> 
>>> Message from syslogd@host at Oct 14 12:05:27 ...
>>>  kernel:BUG: soft lockup - CPU#13 stuck for 23s! [migration/13:217]
>>> 
>>> kill -9 has no effect. Is this a result of useing too many queues?
>>> 
>>> Jim
>>> 
>>> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
>>>> Hi Jim
>>>> please note that when using distribution to multiple applications (using a 
>>>> comma-separated list in -n), 
>>>> the fan-out API is used which supports up to 32 egress queues total, in 
>>>> your case you are using 73 queues,
>>>> thus I guess only the first 32 instances are receiving traffic (and maybe 
>>>> duplicated traffic due to a wrong 
>>>> egress mask) . I will add a check for this in zbalance_ipc to avoid this 
>>>> kind of misconfigurations.
>>>> 
>>>> Alfredo
>>>> 
>>>>> On 13 Oct 2016, at 22:35, Jim Hranicky <j...@ufl.edu> wrote:
>>>>> 
>>>>> I'm testing out a new server (36 cores, 72 with HT) using
>>>>> zbalance_ipc, and it seems occasionally some packets are
>>>>> getting sent to multiple processes. 
>>>>> 
>>>>> I'm currently running zbalance_ipc like so: 
>>>>> 
>>>>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>>>> 
>>>>> with 72 snorts like so: 
>>>>> 
>>>>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>>>>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>>>>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>>>> 
>>>>> I've got a custom HTTP rule to catch GETs with a particular 
>>>>> user-agent. I run 100 GETs, and each GET request has the run
>>>>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc) 
>>>>> and this is what I end up getting when I check the GETs : 
>>>>> 
>>>>>    1 GET /11
>>>>>    1 GET /2
>>>>>    1 GET /30
>>>>>    1 GET /34
>>>>>    1 GET /37
>>>>>    1 GET /5
>>>>>    1 GET /59
>>>>>    1 GET /62
>>>>>    1 GET /70
>>>>>    1 GET /8
>>>>>    1 GET /83
>>>>>    1 GET /84
>>>>>    1 GET /9
>>>>>    1 GET /90
>>>>>    1 GET /94
>>>>>    1 GET /95
>>>>>   16 GET /97
>>>>>   20 GET /12
>>>>>   20 GET /38
>>>>> 
>>>>> Obviously I'm still running into packet loss, but several of the
>>>>> GETs are getting sent to multiple processes: 
>>>>> 
>>>>>  ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>  ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> 
>>>>> Is this an issue with the zbalance_ipc hash? I tried using
>>>>> 
>>>>> -m 1
>>>>> 
>>>>> but it seemed like I ended up dropping even more packets. 
>>>>> 
>>>>> Any advice/pointers appreciated. 
>>>>> 
>>>>> --
>>>>> Jim Hranicky
>>>>> Data Security Specialist
>>>>> UF Information Technology
>>>>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>>>>> 352-273-1341
>>>>> _______________________________________________
>>>>> Ntop-misc mailing list
>>>>> Ntop-misc@listgateway.unipi.it
>>>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>>> 
>>>> _______________________________________________
>>>> Ntop-misc mailing list
>>>> Ntop-misc@listgateway.unipi.it
>>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>>> 
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> Ntop-misc@listgateway.unipi.it
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>> 
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>> 
> <zbal-logs.txt>_______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Reply via email to