On 06/29/2018 07:01 PM, Jakub Kicinski wrote:
> On Fri, 29 Jun 2018 09:04:15 +0200, Daniel Borkmann wrote:
>> On 06/28/2018 06:54 PM, Jakub Kicinski wrote:
>>> On Thu, 28 Jun 2018 09:42:06 +0200, Jiri Benc wrote:
>>>> On Wed, 27 Jun 2018 11:49:49 +0200, Daniel Borkmann wrote:
>>>>> Looks good to me, and yes in BPF case a mask like TUNNEL_OPTIONS_PRESENT
>>>>> is
>>>>> right approach since this is opaque info and solely defined by the BPF
>>>>> prog
>>>>> that is using the generic helper.
>>>>
>>>> Wouldn't it make sense to introduce some safeguards here (in a backward
>>>> compatible way, of course)? It's easy to mistakenly set data for a
>>>> different tunnel type in a BPF program and then be surprised by the
>>>> result. It might help users if such usage was detected by the kernel,
>>>> one way or another.
>>>
>>> Well, that's how it works today ;)
>>
>> Well, it was designed like that on purpose, to be i) agnostic of the
>> underlying
>> device, ii) to not clutter BPF API with tens of different APIs effectively
>> doing
>> the same thing, and at the same time to avoid adding protocol specifics.
>> E.g. at
>> least core bits of bpf_skb_{set,get}_tunnel_key() will work whether I use
>> vxlan
>> or geneve underneath (we are actually using it this way) and I could use
>> things
>> like tun_id to encode custom meta data from BPF for either of them depending
>> on flavor
>> picked by orchestration system. For the tunnel options in
>> bpf_skb_{set,get}_tunnel_opt()
>> it's similar although here there needs to be awareness of the underlying dev
>> depending
>> on whether you encode data into e.g. gbp or tlvs, etc. However, downside
>> right now I
>> can see with a patch like below is that:
>>
>> i) People might still just keep using 'TUNNEL_OPTIONS_PRESENT path' since
>> available
>> and backwards compatible with current/older kernels, ii) we cut bits away
>> from
>> size over time for each new tunnel proto added in future that would support
>> tunnel
>> options, iii) that extension is one-sided (at least below) and same would be
>> needed
>> in getter part, and iv) there needs to be a way for the case when folks add
>> new
>> tunnel options where we don't need to worry that we forget updating
>> BPF_F_TUN_*
>> each time otherwise this will easily slip through and again people will just
>> rely
>> on using TUNNEL_OPTIONS_PRESENT catchall. Given latter and in particular
>> point i)
>> I wouldn't think it's worth the pain, the APIs were added to BPF in v4.6 so
>> this
>> would buy them 2 more years wrt kernel compatibility with same functionality
>> level.
>> And point v), I just noticed the patch is actually buggy: size is
>> ARG_CONST_SIZE and
>> verifier will attempt to check the value whether the buffer passed in
>> argument 2 is
>> valid or not, so using flags here in upper bits would let verification fail,
>> you'd
>> really have to make a new helper just for this.
>
> Ah, indeed. I'd rather avoid a new helper, if we reuse an old one
> people can always write a program like:
>
> err = helper(all_flags);
> if (err == -EINVAL)
> err = helper(fallback_flags);
>
> With a new helper the program will not even load on old kernels :(
>
> Could we add the flags new to bpf_skb_set_tunnel_key(), maybe? It is a
> bit ugly because only options care about the flags and in theory
> bpf_skb_set_tunnel_key() doesn't have to be called before
> bpf_skb_set_tunnel_opt() ...
Right, though it's also ugly splitting things this way over two helpers
when this is _specifically_ only for one of them (unless you do need it for
hw offload where you need to signal for tunnel key _and_ options in BPF
itself but that's not from what I see is the case here). For sake of
'safeguard'-only I bet users won't do such fallback for reasons above but
again use currently TUNNEL_OPTIONS_PRESENT route, as is since it's simpler
and available since long time already. Thus I would prefer to leave it as
is in such case since ship has sailed long ago on this, and usage of this
extra check is not really obvious either, so I don't think it's worth it
and 'value-add' is marginal.
Thanks,
Daniel