Hi Cliff
please see inline

On May 20, 2013, at 5:45 PM, Cliff Burdick <[email protected]> wrote:

> Alfredo, I'm still now sure how using this method actually is doing 
> zero-copy. Suppose I were to use pfring_recv directly in each thread without 
> TX as your multithread application does. In this case a double pointer is 
> passed to the function where it will now point to the packet being received. 
> Since this is per-thread with a single RX queue, there must have already been 
> a single copy from the NIC to the pf_ring memory, correct?

No, this is 0-copy (the card copied the packet into this buffer). The actual 
thread is the only owner of this buffer.
But, if you use pfring_recv(), you can't access this buffer after the next call 
to pfring_recv() on the same per-thread ring.

> Now if I use the pfring_alloc_pkt_buff/pfring_recv_pkt_buff combo, it returns 
> a single pointer to the buffer in kernel-space. My application will either 
> have to process the data here, or make a copy. My slave/parsing is done in 
> the same thread, so it doesn't buy me anything to have more than one buffer 
> to swap, correct? If this is the case, then the best I could do is simply 
> create a double pointer in my processing code in each thread, then pass this 
> to pfring_recv where the packet is not copied. After the packet is processed, 
> it would repeat this again.

Yes, if you don't need to keep packets aside or forward them, using 
pfring_recv() or pfring_alloc_pkt_buff()/pfring_recv_pkt_buff() is exactly the 
same.

> If the processing is not going fast enough, then I presume that the packets 
> are being queued up in the buffers allocated in 
> dna_cluster_low_level_settings until it can catch up.
> 
> Please let me know if I'm missing something. Thanks.

Hope things are a bit more clear.

Best Regards
Alfredo

> 
> 
> On Sat, May 18, 2013 at 12:04 PM, Cliff Burdick <[email protected]> wrote:
> Thanks Alfredo, answers are below. I had to truncate a bit because it limited 
> the message size.
> 
> 
> >Hi Cliff
> >please see inline
> >
> ...
> >You can avoid this by allocating additional buffers per-thread and using a 
> >buffer swap when receiving a packet. This way you can keep aside up to K 
> >packets, where K is the number of additional buffers allocated in the 
> >per-thread pool.
> >In order to allocate these additional buffers please have a look at 
> >dna_cluster_low_level_settings().
> >To get a buffers from the per-thread pool:
> >pkt_handle = pfring_alloc_pkt_buff(ring[thread_id])
> >To swap a received packet with another buffer: 
> >ret = pfring_recv_pkt_buff(ring[thread_id], pkt_handle, &hdr, 
> >wait_for_packet)
> >
> This is very useful. I didn't know it was per-thread, so this should make a 
> fairly significant impact. 
> 
> >> For some reason, I am dropping what appears to be an increasing number of 
> >> packets, depending on which thread it is. Usually the lower-numbered 
> >> threads drop about 10%, while the higher-number ones drop around 90%.
> >
> >Please pay attention also to logical/physical cores when playing with core 
> >affinity. Can I see the output of 
> >cat /proc/cpuinfo | grep "processor\|model name\|physical id"
> >and the affinity you are using?
> Cores 0-5 are physical, while 6-11 are hyperthreaded:
> processor       : 0
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 0
> processor       : 1
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 1
> processor       : 2
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 2
> processor       : 3
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 8
> processor       : 4
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 9
> processor       : 5
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 10
> processor       : 6
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 0
> processor       : 7
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 1
> processor       : 8
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 2
> processor       : 9
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 8
> processor       : 10
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 9
> processor       : 11
> model name      : Intel(R) Xeon(R) CPU           L5638  @ 2.00GHz
> physical id     : 0
> core id         : 10
> My threads are as follows:
> 
> Core 0: DNA Cluster Master
> Core 1-9: Receiver Threads
> 
> I can move a thread off core 6, which has a shared cache with core 0 if you 
> think that would help. Here is the output of pfdnamaster_multithread by doing 
> something similar:
> 
> root@bond0:/root> /tmp/pfdnacluster_multithread -i dna0 -c 1 -n 8
> Capturing from dna0
> Using PF_RING v.5.5.2
> Hashing packets per-IP Address
> The DNA cluster [id: 1][num consumer threads: 8] is running...
> Opening cluster dnacluster:1@0
> Consumer thread #0 is running...
> Opening cluster dnacluster:1@1
> Set thread 0 on core 2/12
> Consumer thread #1 is running...
> Opening cluster dnacluster:1@2
> Set thread 1 on core 3/12
> Consumer thread #2 is running...
> Opening cluster dnacluster:1@3
> Set thread 2 on core 4/12
> Consumer thread #3 is running...
> Opening cluster dnacluster:1@4
> Set thread 3 on core 5/12
> Consumer thread #4 is running...
> Opening cluster dnacluster:1@5
> Set thread 4 on core 6/12
> Consumer thread #5 is running...
> Opening cluster dnacluster:1@6
> Set thread 5 on core 7/12
> Consumer thread #6 is running...
> Opening cluster dnacluster:1@7
> Set thread 6 on core 8/12
> Consumer thread #7 is running...
> Set thread 7 on core 9/12
> =========================
> Thread 0
> Absolute Stats: [52876 pkts rcvd][34172500 bytes rcvd]
>                 [52876 total pkts][0 pkts dropped (0.0 %)]
>                 [52'874.94 pkt/sec][273.37 Mbit/sec]
> =========================
> Thread 1
> Absolute Stats: [104995 pkts rcvd][36437358 bytes rcvd]
>                 [104995 total pkts][0 pkts dropped (0.0 %)]
>                 [104'992.90 pkt/sec][291.49 Mbit/sec]
> =========================
> Thread 2
> Absolute Stats: [50422 pkts rcvd][27233447 bytes rcvd]
>                 [50422 total pkts][0 pkts dropped (0.0 %)]
>                 [50'420.99 pkt/sec][217.86 Mbit/sec]
> =========================
> Thread 3
> Absolute Stats: [55373 pkts rcvd][23669520 bytes rcvd]
>                 [55373 total pkts][0 pkts dropped (0.0 %)]
>                 [55'371.89 pkt/sec][189.35 Mbit/sec]
> =========================
> Thread 4
> Absolute Stats: [54588 pkts rcvd][32153687 bytes rcvd]
>                 [54588 total pkts][0 pkts dropped (0.0 %)]
>                 [54'586.90 pkt/sec][257.22 Mbit/sec]
> =========================
> Thread 5
> Absolute Stats: [2503 pkts rcvd][13166211 bytes rcvd]
>                 [2503 total pkts][0 pkts dropped (0.0 %)]
>                 [2'502.94 pkt/sec][105.33 Mbit/sec]
> =========================
> Thread 6
> Absolute Stats: [54631 pkts rcvd][34089918 bytes rcvd]
>                 [54631 total pkts][0 pkts dropped (0.0 %)]
>                 [54'629.90 pkt/sec][272.71 Mbit/sec]
> =========================
> Thread 7
> Absolute Stats: [54764 pkts rcvd][29179172 bytes rcvd]
>                 [54764 total pkts][0 pkts dropped (0.0 %)]
>                 [54'762.90 pkt/sec][233.43 Mbit/sec]
> =========================
> 
> Thanks.
> 
> 
> 
> On Fri, May 17, 2013 at 8:30 PM, Cliff Burdick <[email protected]> wrote:
> I have an application configured with the DNA cluster running on core 0, with 
> 8 threads running on cores 1-8 on a Xeon processor. I'm using a custom hash 
> function which just picks off the last octet of the source IP, and sends it 
> to threads 1-8. I'm loading the DNA driver using the following:
> 
> insmod ixgbe.ko MQ=0,0  mtu=9000
> 
> When I run pfdnacluster_multithread I can start 8 threads without any 
> dropping of packets. My understanding is that to use zero-copy mode, I can 
> only have a single thread operating on the packets at a time since the buffer 
> is automatically freed when another pfring_recv call is made. Because of 
> this, each of my slave threads make a copy of the data before immediately 
> returning back to call pfring_recv again. For some reason, I am dropping what 
> appears to be an increasing number of packets, depending on which thread it 
> is. Usually the lower-numbered threads drop about 10%, while the 
> higher-number ones drop around 90%. I'm receiving about 230Kpps (1.3Gbps) 
> evenly distributed between the threads, and my understanding was that DNA 
> mode would handle this. My code for the receiver is identical to the 
> multithread example (8192 buffers for rx/tx, receive only, wait_mode =0). 
> 
> My slave thread makes the call using the following:
> pfring_recv_parsed(m_ring, &packet, 0, &header, 1, 0, 1, 0);
> 
> Also, what is the preferred way of dropping packets inside of the hash 
> function when I don't want it routed to any of my threads, return 
> DNA_CLUSTER_FAIL, or send it to a queue that is not being processed?
> 
> Any help is appreciated. Thanks.
> 
> 
> _______________________________________________
> Ntop-misc mailing list
> [email protected]
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Reply via email to