Am 30.01.2014 um 14:17 schrieb Michał Geszkiewicz <mic...@wp.pl>:

> So now is good time for me to follow MHaberler searching of best udp 
> stack or layer or library to decrease this terrible overhead while 
> reading data from 7i80. It took about 400 to 700us on each 1ms thread entry.

well these figures suggest we'll need some bigger bore artillery. 

Here's some ideas (I mark those taken from Nicholas as (N) and mine as (M)).

get ballpark figures for what's possible from the osadl.org QA-Farm UDP 
roundtrip tests (N,M):
https://www.osadl.org/Real-time-Ethernet-UDP-worst-case-roun.qa-farm-rt-ethernet-udp-monitor.0.html
 (also has some hints for kernel options and methods).
the EtherCAT figures suggest there's quite some room for improvement, modulo 
hardware: https://www.osadl.org/?id=923

grinding the cat in suggested order (M):

strategy 1: cut out part of the kernel stack by using raw sockets (man 7 raw) 
(N)

  see http://yusufonlinux.blogspot.co.at/2010/11/raw-socket-in-linux.html for 
an intro; not sure if this leaves UDP intact.

strategy 1a: as above, but move down to the ethernet frame level (IP stack 
bypassed), and explore memory mapped packets and the 
PACKET_TX_RING/PACKET_RX_RING facilities (M).

  might need a kernel with CONFIG_PACKET=y and CONFIG_PACKET_MMAP=y.
  See the packet-rx-ring.c packet-tx-ring.c examples in 
https://github.com/vieites4/rawsockets/tree/master/docs/snippets, and those 
links:
  http://yusufonlinux.blogspot.co.at/2010/11/data-link-access-and-zero-copy.html
  
http://codemonkeytips.blogspot.co.at/2011/07/asynchronous-packet-socket-reading-with.html

strategy 2: cut out the kernel pretty much completely by using a userland 
device driver, and UIO. (N).

   see the Yang/McGuire paper 
http://static.mah.priv.at/public/rtlws-proceedings/rtlws-2012/proc/Yang.pdf - 
Nicholas indicated some code might be found. 

I would think overall its more productive to forget about UDP framing for now 
and get down to pushing ethernet frames in and out as fast as possible. 
Building a fake component for timing and using some form of fast packet 
responder on the other end might help. 


Nicholas suggests using ftrace to look at the system call code paths. ftrace is 
new to me; I just played a bit with it and I'm overwhelmed but it might have 
potential to nail down  timing issues of a particular system call. Also I 
played a bit with kernelshark, overwhelms me too ;)

 this got me from zero to 0.01:
   https://lwn.net/Articles/341902/ (install as outlined), and this:
   http://www.linuxforu.com/2010/11/kernel-tracing-with-ftrace-part-1/
   http://www.linuxforu.com/2010/12/kernel-tracing-with-ftrace-part-2/

Once ethernet level packet I/O is settled as fast enough, we can turn to add 
the minimal IP/UDP framing needed. This will be bare bones (no ping, arp, and 
other amenenities) but should have pretty much the same timing behavior and 
good enough for the task at hand.

these are desk wisdoms, so handle with care.

- Michael


ps: I'm still relaying here between a private email thread in german with 
Nicholas, and over here, which isnt terribly efficient; so either we can drag 
Nicholas over here or we move to linux-rt-us...@vger.kernel.org .




------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Emc-developers mailing list
Emc-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to