On 03/04/2014 11:17 PM, Alexei Starovoitov wrote:
Extended BPF extends old BPF in the following ways:
- from 2 to 10 registers
   Original BPF has two registers (A and X) and hidden frame pointer.
   Extended BPF has ten registers and read-only frame pointer.
- from 32-bit registers to 64-bit registers
   semantics of old 32-bit ALU operations are preserved via 32-bit
   subregisters
- if (cond) jump_true; else jump_false;
   old BPF insns are replaced with:
   if (cond) jump_true; /* else fallthrough */
- adds signed > and >= insns
- 16 4-byte stack slots for register spill-fill replaced with
   up to 512 bytes of multi-use stack space
- introduces bpf_call insn and register passing convention for zero
   overhead calls from/to other kernel functions (not part of this patch)
- adds arithmetic right shift insn
- adds swab32/swab64 insns
- adds atomic_add insn
- old tax/txa insns are replaced with 'mov dst,src' insn

Extended BPF is designed to be JITed with one to one mapping, which
allows GCC/LLVM backends to generate optimized BPF code that performs
almost as fast as natively compiled code

sk_convert_filter() remaps old style insns into extended:
'sock_filter' instructions are remapped on the fly to
'sock_filter_ext' extended instructions when
sysctl net.core.bpf_ext_enable=1

Old filter comes through sk_attach_filter() or sk_unattached_filter_create()
  if (bpf_ext_enable) {
     convert to new
     sk_chk_filter() - check old bpf
     use sk_run_filter_ext() - new interpreter
  } else {
     sk_chk_filter() - check old bpf
     if (bpf_jit_enable)
         use old jit
     else
         use sk_run_filter() - old interpreter
  }

sk_run_filter_ext() interpreter is noticeably faster
than sk_run_filter() for two reasons:

1.fall-through jumps
   Old BPF jump instructions are forced to go either 'true' or 'false'
   branch which causes branch-miss penalty.
   Extended BPF jump instructions have one branch and fall-through,
   which fit CPU branch predictor logic better.
   'perf stat' shows drastic difference for branch-misses.

2.jump-threaded implementation of interpreter vs switch statement
   Instead of single tablejump at the top of 'switch' statement, GCC will
   generate multiple tablejump instructions, which helps CPU branch predictor

Performance of two BPF filters generated by libpcap was measured
on x86_64, i386 and arm32.

fprog #1 is taken from Documentation/networking/filter.txt:
tcpdump -i eth0 port 22 -dd

fprog #2 is taken from 'man tcpdump':
tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
    ((tcp[12]&0xf0)>>2)) != 0)' -dd

Other libpcap programs have similar performance differences.

Raw performance data from BPF micro-benchmark:
SK_RUN_FILTER on same SKB (cache-hit) or 10k SKBs (cache-miss)
time in nsec per call, smaller is better
--x86_64--
          fprog #1  fprog #1   fprog #2  fprog #2
          cache-hit cache-miss cache-hit cache-miss
old BPF     90       101       192       202
ext BPF     31        71       47         97
old BPF jit 12        34       17         44
ext BPF jit TBD

--i386--
          fprog #1  fprog #1   fprog #2  fprog #2
          cache-hit cache-miss cache-hit cache-miss
old BPF    107        136      227       252
ext BPF     40        119       69       172

--arm32--
          fprog #1  fprog #1   fprog #2  fprog #2
          cache-hit cache-miss cache-hit cache-miss
old BPF    202        300      475       540
ext BPF    139        270      296       470
old BPF jit 26        182       37       202
new BPF jit TBD

Tested with trinify BPF fuzzer

Future work:

0. seccomp

1. add extended BPF JIT for x86_64

2. add inband old/new demux and extended BPF verifier, so that new programs
    can be loaded through old sk_attach_filter() and 
sk_unattached_filter_create()
    interfaces

3. tracing filters systemtap-like with extended BPF

4. OVS with extended BPF

5. nftables with extended BPF

Signed-off-by: Alexei Starovoitov <a...@plumgrid.com>
Acked-by: Hagen Paul Pfeifer <ha...@jauu.net>

From what I can tell, looks good to me:

Reviewed-by: Daniel Borkmann <dbork...@redhat.com>

So next step would be to add selftests and then after that JIT?

...
+#undef LOAD_IMM
+}
+EXPORT_SYMBOL(sk_run_filter_ext);
+

One minor thing I noticed when I git-am'ed your patch is the newline at
the end of file, but perhaps this can be fixed up in directly patchwork.

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index cf9cd13509a7..e1b979312588 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to