Le 20/11/2018 à 02:06, Daniel Borkmann a écrit :
> On 11/13/2018 05:35 PM, Nicolas Dichtel wrote:
>> This new mode enables to add or remove an l2 header in a programmatic way
>> with cls_bpf.
>> For example, it enables to play with mpls headers.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
>> Acked-by: Martin KaFai Lau <ka...@fb.com>
> 
> (Sorry for late reply, swamped due to Plumbers.)
I thought so, no problem ;-)

> 
>> ---
>>
>> v2: use skb_set_network_header()
>>
>>  include/uapi/linux/bpf.h       |  3 ++
>>  net/core/filter.c              | 53 ++++++++++++++++++++++++++++++++++
>>  tools/include/uapi/linux/bpf.h |  3 ++
>>  3 files changed, 59 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 47d606d744cc..9762ef3a536c 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1467,6 +1467,8 @@ union bpf_attr {
>>   *
>>   *          * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
>>   *            (room space is added or removed below the layer 3 header).
>> + *          * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
>> + *            packet (room space is added or removed below skb->data).
> 
> Nit: in this case it would probably make more sense to name it 
> BPF_ADJ_ROOM_MAC.
> BPF_ADJ_ROOM_DATA reads like we're adjusting payload data.
Yep, I first had in mind skb->data, but you're right.

> 
>>   *          All values for *flags* are reserved for future usage, and must
>>   *          be left at zero.
>> @@ -2412,6 +2414,7 @@ enum bpf_func_id {
>>  /* Mode for BPF_FUNC_skb_adjust_room helper. */
>>  enum bpf_adj_room_mode {
>>      BPF_ADJ_ROOM_NET,
>> +    BPF_ADJ_ROOM_DATA,
>>  };
>>  
>>  /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 53d50fb75ea1..e56da3077dca 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -2884,6 +2884,57 @@ static int bpf_skb_adjust_net(struct sk_buff *skb, 
>> s32 len_diff)
>>      return ret;
>>  }
>>  
>> +static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len)
>> +{
>> +    unsigned short hhlen = skb->dev->header_ops ?
>> +                           skb->dev->hard_header_len : 0;
>> +    int ret;
>> +
>> +    ret = skb_unclone(skb, GFP_ATOMIC);
>> +    if (unlikely(ret < 0))
>> +            return ret;
>> +
>> +    __skb_pull(skb, len);
>> +    skb_reset_mac_header(skb);
>> +    skb_set_network_header(skb, hhlen);
>> +    skb_reset_transport_header(skb);
>> +    return 0;
>> +}
>> +
>> +static int bpf_skb_data_grow(struct sk_buff *skb, u32 len)
>> +{
>> +    unsigned short hhlen = skb->dev->header_ops ?
>> +                           skb->dev->hard_header_len : 0;
>> +    int ret;
>> +
>> +    ret = skb_cow(skb, len);
>> +    if (unlikely(ret < 0))
>> +            return ret;
>> +
>> +    skb_push(skb, len);
>> +    skb_reset_mac_header(skb);
> 
> Looks like this could leak uninitialized mem into the BPF prog as it's
> not zeroed out. Can't we just reuse bpf_skb_generic_push() and the
> bpf_skb_generic_pop()?
Hmm, at some point, it was not possible, but it was an intermediate version and
I don't see why now.

> 
> bpf_skb_vlan_push() and bpf_skb_vlan_pop() do a bpf_push_mac_rcsum() /
> bpf_pull_mac_rcsum() in order to not screw up things like csum complete.
> Wouldn't this be needed here as well?
> 
> How does this interact with MPLS GSO?
I overlooked both points.

> 
> If the packet gets pushed back into the stack, would AF_MPLS handle it?
> Probably good to add test cases to BPF kselftest suite with it.
Ok, I will have a look to this.

> 
> Don't we need to reject the packet in case of skb->encapsulation?
> 
> Looking at push_mpls() and pop_mpls() from OVS implementation, do we
> also need to deal with skb inner network header / proto?
> 
> Given all this would it rather make sense to add MPLS specific helpers
> in similar fashion to the vlan ones we have?
> 
> Is there any other use case aside from MPLS?
Yes, I was targeting a generic encap/decap at l2 level. Maybe more complex than
I thought.

> 
>> +    return 0;
>> +}
>> +
>> +static int bpf_skb_adjust_data(struct sk_buff *skb, s32 len_diff)
>> +{
>> +    u32 len_diff_abs = abs(len_diff);
>> +    bool shrink = len_diff < 0;
>> +    int ret;
>> +
>> +    if (unlikely(len_diff_abs > 0xfffU))
>> +            return -EFAULT;
>> +
>> +    if (shrink && len_diff_abs >= skb_headlen(skb))
>> +            return -EFAULT;
>> +
>> +    ret = shrink ? bpf_skb_data_shrink(skb, len_diff_abs) :
>> +                   bpf_skb_data_grow(skb, len_diff_abs);
> 
> Hmm, I think this has a bug in that the above two do not adjust
> skb->mac_len. Then things like cls_bpf_classify() will __skb_pull()
> based on the old mac_len, getting you to a wrong offset in the
> packet when this is for example pushed back into ingress, etc.
Nice catch!


Thank you,
Nicolas

Reply via email to