Daniel Borkmann <dan...@iogearbox.net> wrote: > On 03/04/2018 08:40 PM, Florian Westphal wrote: > > Its still in early stage, but I think its good enough as > > a proof-of-concept. > > be the critical part in that it needs to be flexible enough to cover > both front ends well enough without having to make compromises to > one of them.
Yes, thats the idea. > The same would be for optimization passes e.g. when we > know that two successive rules would match on TCP header bits that we > can reuse the register that loaded/parsed it previously to that point. Yes, I already thought about this. First step would be to simply track the last accessed header base (mac, nh, th) in the imr state so we can reuse the offset if possible. > Similar when it comes to maps when the lookup value would need to > propagate through the linked imr objects. Do you see such optimizations > or in general propagation of state as direct part of the IMR or rather > somewhat hidden in IMR layer when doing the IMR to BPF 'jit' phase? Direct part. I would only say that it should be hidden in the sense that frontends don't need to do anything here. I would do such propagation in the IMR before generating code. The IMR has enough info to figure out: 1. number of (ebpf) registers that are needed simultaneously at most 2. how many payload requests relative to (mac/network/tcp header) there are (both in total and in relation to rule count). So it could e.g. decide to set aside 2 registers to contain the th/nh offset to have them ready at all times, while not covering mac header at all because its not used anywhere. Likewise, we could even avoid for instance extension header parsing as IMR would see that there are no 'IPv6+transport header' rules. > Which other parts do you think would be needed for the IMR aside > from above basic operations? ALU ops, JMPs for the relational ops? We'll need at least following IMR features/objects (pseudo instructions): - relops: gt, ge, lt etc. - all bitops (shift, xor, and, etc) - payload: allow stores instead of loads (this includes csum fixup) - verdict: jumps to other chains, nfqueue verdict - meta: (accesses to in/output interface names, skb length, and the like). - register allocation for ipv6 addresses, interface names, etc (anything larger than 64 bit). I would also add 'highlevel' objects that are themselves translated into basic operations. Most obvious example are 'fetch 4 bytes x bytes into transport header'. Frontend should not need to bother with ipv4 details, such as ip options. Instead IMR should take care of this and generate the needed instructions to fetch iph->ihl and figure out the correct transport header offset. It could do so by adding the needed part of the rule (fetch byte containing ihl, mask/shift result, store the offset, etc.) internally. I would also add: - limit (to jit both xt_limit and nft limit) - statistic (random sampling) as both nft and iptables have those features and i think they're pretty much the same so it can be handled by common code. Same for others extensions but those have side effects and I don't know yet how to handle that (log, counter and the like). Same for set lookups. In iptables case, they will only be 'address/network/interface is (not) in set', i.e. only match/no match, and set manipulations (SET target). In nftables case it can also be something like 'retrieve value belonging to key (dreg = lookup(map, sreg, sreg_len), and then dreg value gets used in other ways. I think that by making the IMR also handle the nftables case rather than iptables one has the advantage that the IMR might be able to optimize repetitiive iptables rules (and nft rules that do not use jump map even if they could). iptables .. -i eth0 -j ETH0_RULES iptables .. -i eth1 -j ETH1_RULES .. iptables .. -i ethX -j ETHX_RULES could be transparently thrown into a map and the jump target retrieved from it base on iifname... The IMR would also need to be able to create/allocate such sets/maps on its own. Do you see this as something that has to be decided now? Ideally feeling about IMR would be that we can probably handle this later by extending it rather than having to throw it away plus a rewrite :) > I think it would be good to have clear semantics in terms of what > it would eventually abstract away from raw BPF when sitting on top > of it; potentially these could be details on packet access or [..] Yes, it would be good if IMR could e.g. do merge of adjacent loads: load 4 bytes from network header at offset 12 (relop, equal) imm 0x1 load 4 bytes from network header at offset 16 (relop, equal) imm 0x2 .. would be condensed to ... load 8 bytes from network header at offset 12 (relop, equal) imm 0x100000002 ... before generating ebpf code. I don't think this should be part of frontend (too lowlevel task for frontend) and I don't see why it should happen when generating code (too high level task for that). > interaction with helpers or other BPF features such that it's BPF prog > type independent at this stage, e.g. another point could be that given > the priv/cap level of the uapi request, there could also be different > BPF code gen backends that implement against the IMR, e.g. when the > request comes out of userns then it has feature constraints in terms > of e.g. having to use bpf_skb_{load,store}_bytes() helpers for packet > access instead of direct packet access or not being able to use BPF to > BPF calls, etc; wdyt? I haven't thought about this at all, I think this is useful. When initializing IMR state we could provide feature flag(s) that could either instantly reject something (e.g. context only allows ro-access to packets but IMR was just told to mangle payload), or we could consider it during code-gen phase (i don't understand ebpf well enough to say wheter thats useful or not, but if its then it would be good if IMR could do it. I did not intend the IMR as a 'public' library -- it would only be for use by in-tree umh translators. So anything that needs changing at later stage should be doable, but obviously the less rework it needs later on the better... Thanks for reviewing!