Alfredo, I'm still now sure how using this method actually is doing zero-copy. Suppose I were to use pfring_recv directly in each thread without TX as your multithread application does. In this case a double pointer is passed to the function where it will now point to the packet being received. Since this is per-thread with a single RX queue, there must have already been a single copy from the NIC to the pf_ring memory, correct? Now if I use the pfring_alloc_pkt_buff/pfring_recv_pkt_buff combo, it returns a single pointer to the buffer in kernel-space. My application will either have to process the data here, or make a copy. My slave/parsing is done in the same thread, so it doesn't buy me anything to have more than one buffer to swap, correct? If this is the case, then the best I could do is simply create a double pointer in my processing code in each thread, then pass this to pfring_recv where the packet is not copied. After the packet is processed, it would repeat this again. If the processing is not going fast enough, then I presume that the packets are being queued up in the buffers allocated in dna_cluster_low_level_settings until it can catch up.
Please let me know if I'm missing something. Thanks. On Sat, May 18, 2013 at 12:04 PM, Cliff Burdick <[email protected]> wrote: > Thanks Alfredo, answers are below. I had to truncate a bit because it limited > the message size. > > > >Hi Cliff > >please see inline > > > ... > >You can avoid this by allocating additional buffers per-thread and using a > >buffer swap when receiving a packet. This way you can keep aside up to K > >packets, where K is the number of additional buffers allocated in the > >per-thread pool. > >In order to allocate these additional buffers please have a look at > >dna_cluster_low_level_settings(). > >To get a buffers from the per-thread pool: > >pkt_handle = pfring_alloc_pkt_buff(ring[thread_id]) > >To swap a received packet with another buffer: > >ret = pfring_recv_pkt_buff(ring[thread_id], pkt_handle, &hdr, > >wait_for_packet) > > > > This is very useful. I didn't know it was per-thread, so this should make a > fairly significant impact. > > > >> For some reason, I am dropping what appears to be an increasing number of > >> packets, depending on which thread it is. Usually the lower-numbered > >> threads drop about 10%, while the higher-number ones drop around 90%. > > > >Please pay attention also to logical/physical cores when playing with core > >affinity. Can I see the output of > >cat /proc/cpuinfo | grep "processor\|model name\|physical id" > >and the affinity you are using? > > Cores 0-5 are physical, while 6-11 are hyperthreaded: > > processor : 0 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 0 > processor : 1 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 1 > processor : 2 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 2 > processor : 3 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 8 > processor : 4 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 9 > processor : 5 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 10 > processor : 6 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 0 > processor : 7 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 1 > processor : 8 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 2 > processor : 9 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 8 > processor : 10 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 9 > processor : 11 > model name : Intel(R) Xeon(R) CPU L5638 @ 2.00GHz > physical id : 0 > core id : 10 > > My threads are as follows: > > > Core 0: DNA Cluster Master > > Core 1-9: Receiver Threads > > > I can move a thread off core 6, which has a shared cache with core 0 if you > think that would help. Here is the output of pfdnamaster_multithread by doing > something similar: > > > root@bond0:/root> /tmp/pfdnacluster_multithread -i dna0 -c 1 -n 8 > Capturing from dna0 > Using PF_RING v.5.5.2 > Hashing packets per-IP Address > The DNA cluster [id: 1][num consumer threads: 8] is running... > Opening cluster dnacluster:1@0 > Consumer thread #0 is running... > Opening cluster dnacluster:1@1 > Set thread 0 on core 2/12 > Consumer thread #1 is running... > Opening cluster dnacluster:1@2 > Set thread 1 on core 3/12 > Consumer thread #2 is running... > Opening cluster dnacluster:1@3 > Set thread 2 on core 4/12 > Consumer thread #3 is running... > Opening cluster dnacluster:1@4 > Set thread 3 on core 5/12 > Consumer thread #4 is running... > Opening cluster dnacluster:1@5 > Set thread 4 on core 6/12 > Consumer thread #5 is running... > Opening cluster dnacluster:1@6 > Set thread 5 on core 7/12 > Consumer thread #6 is running... > Opening cluster dnacluster:1@7 > Set thread 6 on core 8/12 > Consumer thread #7 is running... > Set thread 7 on core 9/12 > ========================= > Thread 0 > Absolute Stats: [52876 pkts rcvd][34172500 bytes rcvd] > [52876 total pkts][0 pkts dropped (0.0 %)] > [52'874.94 pkt/sec][273.37 Mbit/sec] > ========================= > Thread 1 > Absolute Stats: [104995 pkts rcvd][36437358 bytes rcvd] > [104995 total pkts][0 pkts dropped (0.0 %)] > [104'992.90 pkt/sec][291.49 Mbit/sec] > ========================= > Thread 2 > Absolute Stats: [50422 pkts rcvd][27233447 bytes rcvd] > [50422 total pkts][0 pkts dropped (0.0 %)] > [50'420.99 pkt/sec][217.86 Mbit/sec] > ========================= > Thread 3 > Absolute Stats: [55373 pkts rcvd][23669520 bytes rcvd] > [55373 total pkts][0 pkts dropped (0.0 %)] > [55'371.89 pkt/sec][189.35 Mbit/sec] > ========================= > Thread 4 > Absolute Stats: [54588 pkts rcvd][32153687 bytes rcvd] > [54588 total pkts][0 pkts dropped (0.0 %)] > [54'586.90 pkt/sec][257.22 Mbit/sec] > ========================= > Thread 5 > Absolute Stats: [2503 pkts rcvd][13166211 bytes rcvd] > [2503 total pkts][0 pkts dropped (0.0 %)] > [2'502.94 pkt/sec][105.33 Mbit/sec] > ========================= > Thread 6 > Absolute Stats: [54631 pkts rcvd][34089918 bytes rcvd] > [54631 total pkts][0 pkts dropped (0.0 %)] > [54'629.90 pkt/sec][272.71 Mbit/sec] > ========================= > Thread 7 > Absolute Stats: [54764 pkts rcvd][29179172 bytes rcvd] > [54764 total pkts][0 pkts dropped (0.0 %)] > [54'762.90 pkt/sec][233.43 Mbit/sec] > ========================= > > > Thanks. > > > > On Fri, May 17, 2013 at 8:30 PM, Cliff Burdick <[email protected]> wrote: > >> I have an application configured with the DNA cluster running on core 0, >> with 8 threads running on cores 1-8 on a Xeon processor. I'm using a custom >> hash function which just picks off the last octet of the source IP, and >> sends it to threads 1-8. I'm loading the DNA driver using the following: >> >> insmod ixgbe.ko MQ=0,0 mtu=9000 >> >> When I run pfdnacluster_multithread I can start 8 threads without any >> dropping of packets. My understanding is that to use zero-copy mode, I can >> only have a single thread operating on the packets at a time since the >> buffer is automatically freed when another pfring_recv call is made. >> Because of this, each of my slave threads make a copy of the data before >> immediately returning back to call pfring_recv again. For some reason, I am >> dropping what appears to be an increasing number of packets, depending on >> which thread it is. Usually the lower-numbered threads drop about 10%, >> while the higher-number ones drop around 90%. I'm receiving about 230Kpps >> (1.3Gbps) evenly distributed between the threads, and my understanding was >> that DNA mode would handle this. My code for the receiver is identical to >> the multithread example (8192 buffers for rx/tx, receive only, wait_mode >> =0). >> >> My slave thread makes the call using the following: >> pfring_recv_parsed(m_ring, &packet, 0, &header, 1, 0, 1, 0); >> >> Also, what is the preferred way of dropping packets inside of the hash >> function when I don't want it routed to any of my threads, return >> DNA_CLUSTER_FAIL, or send it to a queue that is not being processed? >> >> Any help is appreciated. Thanks. >> > >
_______________________________________________ Ntop-misc mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-misc
