bpf_ipv4_fib_lookup() and bpf_ipv6_fib_lookup() build the flow key on
the stack with a bare "struct flowi4 fl4;" / "struct flowi6 fl6;" and
fill it field by field, but never set flowi4_l3mdev / flowi6_l3mdev.

On the non-DIRECT path the lookup goes through the fib rules whenever the
netns has custom rules, which a VRF installs:

        bpf_ipv4_fib_lookup() -> fib_lookup() -> __fib_lookup()
          -> l3mdev_update_flow()   reads !fl->flowi_l3mdev
          -> fib_rules_lookup() -> fib_rule_match()
               -> l3mdev_fib_rule_match()   uses fl->flowi_l3mdev

l3mdev_update_flow() resolves the l3mdev master from the ingress device
only while the field is still zero:

        if (fl->flowi_iif > LOOPBACK_IFINDEX && !fl->flowi_l3mdev) {
                dev = dev_get_by_index_rcu(net, fl->flowi_iif);
                if (dev)
                        fl->flowi_l3mdev = l3mdev_master_ifindex_rcu(dev);
        }

Left at a nonzero stack value the resolution is skipped, and
l3mdev_fib_rule_match() then tests that value as an ifindex, so the VRF
master is not resolved and the rule fails to match: an ingress enslaved
to a VRF can fail to select its table. The same value is also read just
before that, by FIB rules matching on an L3 master device
(l3mdev_fib_rule_iif_match()/_oif_match()), so an "ip rule iif/oif <vrf>"
mismatches the same way.

The helper already initializes the other flow fields the rules path
consumes (flowi4_mark, flowi4_tun_key.tun_id, flowi4_uid and the v6
counterparts); flowi*_l3mdev was added to that set afterwards and this
helper was never updated to match. ip_route_input_slow() likewise zeroes
the field before its input lookup. Do the same here.

CONFIG_INIT_STACK_ALL_ZERO masks this by default, but it depends on
compiler support (CC_HAS_AUTO_VAR_INIT_ZERO), so INIT_STACK_NONE builds,
including older toolchains that fall back to it, are exposed. Built with
INIT_STACK_ALL_PATTERN, a plain bpf_fib_lookup (no VLAN, no DIRECT) over a
VRF slave whose destination is routed only in the VRF table returns
BPF_FIB_LKUP_RET_NOT_FWDED, and resolves with this patch; reverting these
two lines flips it back. The series' VRF selftests pass on the default
config either way, so they do not exercise this fix.

Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset 
for port devices")
Signed-off-by: Avinash Duduskar <[email protected]>
---
 net/core/filter.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 9590877b0714..6fa172cb1348 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6162,6 +6162,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct 
bpf_fib_lookup *params,
        fl4.flowi4_dscp = inet_dsfield_to_dscp(params->tos);
        fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
        fl4.flowi4_flags = 0;
+       fl4.flowi4_l3mdev = 0;
 
        fl4.flowi4_proto = params->l4_protocol;
        fl4.daddr = params->ipv4_dst;
@@ -6307,6 +6308,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct 
bpf_fib_lookup *params,
        fl6.flowlabel = params->flowinfo;
        fl6.flowi6_scope = 0;
        fl6.flowi6_flags = 0;
+       fl6.flowi6_l3mdev = 0;
        fl6.mp_hash = 0;
 
        fl6.flowi6_proto = params->l4_protocol;

base-commit: 140fa23df957b51385aa847986d44ad7f59b0563
-- 
2.54.0


Reply via email to