-----Original Message----- > Date: Thu, 5 Apr 2018 12:51:16 +0000 > From: "Ananyev, Konstantin" <konstantin.anan...@intel.com> > To: Jerin Jacob <jerin.ja...@caviumnetworks.com> > CC: "dev@dpdk.org" <dev@dpdk.org> > Subject: RE: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF > filters > > > Hi Jerin, > > > > > > > > > > > > > > > > > > > +/* > > > > > > > + * Marks given callback as used by datapath. > > > > > > > + */ > > > > > > > +static __rte_always_inline void > > > > > > > +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi) > > > > > > > +{ > > > > > > > + cbi->use++; > > > > > > > + /* make sure no store/load reordering could happen */ > > > > > > > + rte_smp_mb(); > > > > > > > +} > > > > > > > + > > > > > > > +/* > > > > > > > + * Marks given callback list as not used by datapath. > > > > > > > + */ > > > > > > > +static __rte_always_inline void > > > > > > > +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi) > > > > > > > +{ > > > > > > > + /* make sure all previous loads are completed */ > > > > > > > + rte_smp_rmb(); > > > > > > > > > > > > We earlier discussed this barrier. Will following scheme works out > > > > > > to > > > > > > fix the bpf_eth_cbi_wait() without cbi->use scheme? > > > > > > > > > > > > #ie. We need to exit from jitted or interpreted code irrespective > > > > > > of its > > > > > > state. IMO, We can do that by an _arch_ specific function to fill > > > > > > jitted memory with > > > > > > "exit" opcode(value:0x95, exit, return r0),so that above code needs > > > > > > to be come out i n anycase, > > > > > > on next instruction execution. I know, jitted memory is read-only > > > > > > in your > > > > > > design, I think, we can change the permission to "write" to the fill > > > > > > "exit" opcode(both jitted or interpreted case) for termination. > > > > > > > > > > > > What you think? > > > > > > > > > > Not sure I understand your proposal... > > > > > > > > If I understand it correctly, bpf_eth_cbi_wait() is used to _wait_ until > > > > eBPF program exits? Right? > > > > > > Kind off, but not only. > > > After bpf_eth_cbi_wait() finishes it is guaranteed that data-path > > > wouldn't try > > > to access the resources associated with given bpf_eth_cbi (bpf, jit), so > > > we > > > can proceed with freeing them. > > > > > > > . Instead of using bpf_eth_cbi_[un]use() > > > > scheme which involves the barrier. How about, > > > > > > > > in bpf_eth_cbi_wait() > > > > { > > > > > > > > memset the EBPF "program memory" with 0x95 value. Which is an "exit" and > > > > "return r0" EPBF opcode, Which makes program to terminate by it own > > > > as on 0x95 instruction, CPU decodes and it gets out from EPBF program. > > > > > > > > } > > > > > > > > In jitted case, it is not 0x95 instruction, which will be an arch > > > > specific instructions, We can have arch abstraction to generated > > > > such instruction for "exit" opcode. And use common code to fill the > > > > instructions > > > > to exit from EPBF program provided by arch code. > > > > > > > > Does that makes sense? > > > > > > There is no much point in doing it. > > > > It helps in avoiding the barrier on non x86 case. Right? > > Nope, I believe it doesn't, see below. > > > So it is useful > > thing. Right? and avoid the extra logic in fastpath increment/decrement > > "inuse" counters for all the archs. > > > > > What we need is a guarantee that after some point data-path wouldn't try > > > to access > > > given bpf context, so we can destroy it. > > > > Is there any reason why you think, above proposed solution wont > > guarantee the termination eBPF program? > > > > -ie, > > 1)memset to "exit" instruction in eBPF memory > > Even when code is just interpreted (bpf_exec()) - there still be cases > when you need to synchronize execution thread with thread updating the code > (32bit systems, 16B LDDW instruction, etc.). > With JIT-ed code things will become much more complicated (icache, variable > size instructions) > and I can't see how it could be done without extra synchronization between > execute and update threads. > > > 2)Wait for N instruction cycles to terminate the program. > > There is no way to guarantee that execution would take exactly N cycles. > Execution thread could be preempted/interrupted, it could be executing > syscall, > there could be CPU stall (access slow memory, cpu freq change, etc.).
I agree. Things make worst with EBPF tail call etc. > > So even we'll solve all problems with 1) - it wouldn't buy us a safe solution. > > Actually quite a lot of research was done how to speedup slow/fast path > synchronization > in user-space: > > https://lwn.net/Articles/573424/ > some theory beyond: > https://lttng.org/files/thesis/desnoyers-dissertation-2009-12-v27.pdf > (chapter 6) > They even introduced a new syscall in Linux for these purposes: > http://man7.org/linux/man-pages/man2/membarrier.2.html > > I thought about something similar based on membarrier(), but it has > few implications: > 1. only latest linux kernels (4.14+) > 2. Not sure is it available on non x86 platforms. > 3. Need to measure real impact. > > Because of 1) and 2) we probably would need both mb() and membarrier() code > paths. > Anyway - it is probably worth investigating for more generic solution, > but I suppose it is out of scope for that patch. Yes.