Am 30.01.2014 um 14:17 schrieb Michał Geszkiewicz <mic...@wp.pl>: > So now is good time for me to follow MHaberler searching of best udp > stack or layer or library to decrease this terrible overhead while > reading data from 7i80. It took about 400 to 700us on each 1ms thread entry.
well these figures suggest we'll need some bigger bore artillery. Here's some ideas (I mark those taken from Nicholas as (N) and mine as (M)). get ballpark figures for what's possible from the osadl.org QA-Farm UDP roundtrip tests (N,M): https://www.osadl.org/Real-time-Ethernet-UDP-worst-case-roun.qa-farm-rt-ethernet-udp-monitor.0.html (also has some hints for kernel options and methods). the EtherCAT figures suggest there's quite some room for improvement, modulo hardware: https://www.osadl.org/?id=923 grinding the cat in suggested order (M): strategy 1: cut out part of the kernel stack by using raw sockets (man 7 raw) (N) see http://yusufonlinux.blogspot.co.at/2010/11/raw-socket-in-linux.html for an intro; not sure if this leaves UDP intact. strategy 1a: as above, but move down to the ethernet frame level (IP stack bypassed), and explore memory mapped packets and the PACKET_TX_RING/PACKET_RX_RING facilities (M). might need a kernel with CONFIG_PACKET=y and CONFIG_PACKET_MMAP=y. See the packet-rx-ring.c packet-tx-ring.c examples in https://github.com/vieites4/rawsockets/tree/master/docs/snippets, and those links: http://yusufonlinux.blogspot.co.at/2010/11/data-link-access-and-zero-copy.html http://codemonkeytips.blogspot.co.at/2011/07/asynchronous-packet-socket-reading-with.html strategy 2: cut out the kernel pretty much completely by using a userland device driver, and UIO. (N). see the Yang/McGuire paper http://static.mah.priv.at/public/rtlws-proceedings/rtlws-2012/proc/Yang.pdf - Nicholas indicated some code might be found. I would think overall its more productive to forget about UDP framing for now and get down to pushing ethernet frames in and out as fast as possible. Building a fake component for timing and using some form of fast packet responder on the other end might help. Nicholas suggests using ftrace to look at the system call code paths. ftrace is new to me; I just played a bit with it and I'm overwhelmed but it might have potential to nail down timing issues of a particular system call. Also I played a bit with kernelshark, overwhelms me too ;) this got me from zero to 0.01: https://lwn.net/Articles/341902/ (install as outlined), and this: http://www.linuxforu.com/2010/11/kernel-tracing-with-ftrace-part-1/ http://www.linuxforu.com/2010/12/kernel-tracing-with-ftrace-part-2/ Once ethernet level packet I/O is settled as fast enough, we can turn to add the minimal IP/UDP framing needed. This will be bare bones (no ping, arp, and other amenenities) but should have pretty much the same timing behavior and good enough for the task at hand. these are desk wisdoms, so handle with care. - Michael ps: I'm still relaying here between a private email thread in german with Nicholas, and over here, which isnt terribly efficient; so either we can drag Nicholas over here or we move to linux-rt-us...@vger.kernel.org . ------------------------------------------------------------------------------ WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk _______________________________________________ Emc-developers mailing list Emc-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/emc-developers