Re: [Ntop-misc] DNA Clustering Dropping Packets

Cliff Burdick Mon, 20 May 2013 08:45:59 -0700

Alfredo, I'm still now sure how using this method actually is doing
zero-copy. Suppose I were to use pfring_recv directly in each thread
without TX as your multithread application does. In this case a double
pointer is passed to the function where it will now point to the packet
being received. Since this is per-thread with a single RX queue, there must
have already been a single copy from the NIC to the pf_ring memory,
correct? Now if I use the pfring_alloc_pkt_buff/pfring_recv_pkt_buff combo,
it returns a single pointer to the buffer in kernel-space. My application
will either have to process the data here, or make a copy. My slave/parsing
is done in the same thread, so it doesn't buy me anything to have more than
one buffer to swap, correct? If this is the case, then the best I could do
is simply create a double pointer in my processing code in each thread,
then pass this to pfring_recv where the packet is not copied. After the
packet is processed, it would repeat this again. If the processing is not
going fast enough, then I presume that the packets are being queued up in
the buffers allocated in dna_cluster_low_level_settings until it can catch
up.


Please let me know if I'm missing something. Thanks.


On Sat, May 18, 2013 at 12:04 PM, Cliff Burdick <[email protected]> wrote:

> Thanks Alfredo, answers are below. I had to truncate a bit because it limited 
> the message size.
>
>
> >Hi Cliff
> >please see inline
> >
> ...
> >You can avoid this by allocating additional buffers per-thread and using a 
> >buffer swap when receiving a packet. This way you can keep aside up to K 
> >packets, where K is the number of additional buffers allocated in the 
> >per-thread pool.
> >In order to allocate these additional buffers please have a look at 
> >dna_cluster_low_level_settings().
> >To get a buffers from the per-thread pool:
> >pkt_handle = pfring_alloc_pkt_buff(ring[thread_id])
> >To swap a received packet with another buffer:
> >ret = pfring_recv_pkt_buff(ring[thread_id], pkt_handle, &hdr, 
> >wait_for_packet)
> >
>
> This is very useful. I didn't know it was per-thread, so this should make a 
> fairly significant impact.
>
>
> >> For some reason, I am dropping what appears to be an increasing number of 
> >> packets, depending on which thread it is. Usually the lower-numbered 
> >> threads drop about 10%, while the higher-number ones drop around 90%.
> >
> >Please pay attention also to logical/physical cores when playing with core 
> >affinity. Can I see the output of
> >cat /proc/cpuinfo | grep "processor\|model name\|physical id"
> >and the affinity you are using?
>
> Cores 0-5 are physical, while 6-11 are hyperthreaded:
>
> processor       : 0
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 0
> processor       : 1
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 1
> processor       : 2
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 2
> processor       : 3
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 8
> processor       : 4
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 9
> processor       : 5
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 10
> processor       : 6
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 0
> processor       : 7
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 1
> processor       : 8
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 2
> processor       : 9
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 8
> processor       : 10
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 9
> processor       : 11
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 10
>
> My threads are as follows:
>
>
> Core 0: DNA Cluster Master
>
> Core 1-9: Receiver Threads
>
>
> I can move a thread off core 6, which has a shared cache with core 0 if you 
> think that would help. Here is the output of pfdnamaster_multithread by doing 
> something similar:
>
>
> root@bond0:/root> /tmp/pfdnacluster_multithread -i dna0 -c 1 -n 8
> Capturing from dna0
> Using PF_RING v.5.5.2
> Hashing packets per-IP Address
> The DNA cluster [id: 1][num consumer threads: 8] is running...
> Opening cluster dnacluster:1@0
> Consumer thread #0 is running...
> Opening cluster dnacluster:1@1
> Set thread 0 on core 2/12
> Consumer thread #1 is running...
> Opening cluster dnacluster:1@2
> Set thread 1 on core 3/12
> Consumer thread #2 is running...
> Opening cluster dnacluster:1@3
> Set thread 2 on core 4/12
> Consumer thread #3 is running...
> Opening cluster dnacluster:1@4
> Set thread 3 on core 5/12
> Consumer thread #4 is running...
> Opening cluster dnacluster:1@5
> Set thread 4 on core 6/12
> Consumer thread #5 is running...
> Opening cluster dnacluster:1@6
> Set thread 5 on core 7/12
> Consumer thread #6 is running...
> Opening cluster dnacluster:1@7
> Set thread 6 on core 8/12
> Consumer thread #7 is running...
> Set thread 7 on core 9/12
> =========================
> Thread 0
> Absolute Stats: [52876 pkts rcvd][34172500 bytes rcvd]
>                 [52876 total pkts][0 pkts dropped (0.0 %)]
>                 [52'874.94 pkt/sec][273.37 Mbit/sec]
> =========================
> Thread 1
> Absolute Stats: [104995 pkts rcvd][36437358 bytes rcvd]
>                 [104995 total pkts][0 pkts dropped (0.0 %)]
>                 [104'992.90 pkt/sec][291.49 Mbit/sec]
> =========================
> Thread 2
> Absolute Stats: [50422 pkts rcvd][27233447 bytes rcvd]
>                 [50422 total pkts][0 pkts dropped (0.0 %)]
>                 [50'420.99 pkt/sec][217.86 Mbit/sec]
> =========================
> Thread 3
> Absolute Stats: [55373 pkts rcvd][23669520 bytes rcvd]
>                 [55373 total pkts][0 pkts dropped (0.0 %)]
>                 [55'371.89 pkt/sec][189.35 Mbit/sec]
> =========================
> Thread 4
> Absolute Stats: [54588 pkts rcvd][32153687 bytes rcvd]
>                 [54588 total pkts][0 pkts dropped (0.0 %)]
>                 [54'586.90 pkt/sec][257.22 Mbit/sec]
> =========================
> Thread 5
> Absolute Stats: [2503 pkts rcvd][13166211 bytes rcvd]
>                 [2503 total pkts][0 pkts dropped (0.0 %)]
>                 [2'502.94 pkt/sec][105.33 Mbit/sec]
> =========================
> Thread 6
> Absolute Stats: [54631 pkts rcvd][34089918 bytes rcvd]
>                 [54631 total pkts][0 pkts dropped (0.0 %)]
>                 [54'629.90 pkt/sec][272.71 Mbit/sec]
> =========================
> Thread 7
> Absolute Stats: [54764 pkts rcvd][29179172 bytes rcvd]
>                 [54764 total pkts][0 pkts dropped (0.0 %)]
>                 [54'762.90 pkt/sec][233.43 Mbit/sec]
> =========================
>
>
> Thanks.
>
>
>
> On Fri, May 17, 2013 at 8:30 PM, Cliff Burdick <[email protected]> wrote:
>
>> I have an application configured with the DNA cluster running on core 0,
>> with 8 threads running on cores 1-8 on a Xeon processor. I'm using a custom
>> hash function which just picks off the last octet of the source IP, and
>> sends it to threads 1-8. I'm loading the DNA driver using the following:
>>
>> insmod ixgbe.ko MQ=0,0  mtu=9000
>>
>> When I run pfdnacluster_multithread I can start 8 threads without any
>> dropping of packets. My understanding is that to use zero-copy mode, I can
>> only have a single thread operating on the packets at a time since the
>> buffer is automatically freed when another pfring_recv call is made.
>> Because of this, each of my slave threads make a copy of the data before
>> immediately returning back to call pfring_recv again. For some reason, I am
>> dropping what appears to be an increasing number of packets, depending on
>> which thread it is. Usually the lower-numbered threads drop about 10%,
>> while the higher-number ones drop around 90%. I'm receiving about 230Kpps
>> (1.3Gbps) evenly distributed between the threads, and my understanding was
>> that DNA mode would handle this. My code for the receiver is identical to
>> the multithread example (8192 buffers for rx/tx, receive only, wait_mode
>> =0).
>>
>> My slave thread makes the call using the following:
>> pfring_recv_parsed(m_ring, &packet, 0, &header, 1, 0, 1, 0);
>>
>> Also, what is the preferred way of dropping packets inside of the hash
>> function when I don't want it routed to any of my threads, return
>> DNA_CLUSTER_FAIL, or send it to a queue that is not being processed?
>>
>> Any help is appreciated. Thanks.
>>
>
>

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: [Ntop-misc] DNA Clustering Dropping Packets

Reply via email to