On Mon, Mar 02, 2026 at 01:11:28PM +0800, Jiayuan Chen wrote:
> From: Jiayuan Chen <[email protected]>
>
> When a standalone IPv6 nexthop object is created with a loopback device
> (e.g., "ip -6 nexthop add id 100 dev lo"), fib6_nh_init() misclassifies
> it as a reject route. This is because nexthop objects have no destination
> prefix (fc_dst=::), causing fib6_is_reject() to match any loopback
> nexthop. The reject path skips fib_nh_common_init(), leaving
> nhc_pcpu_rth_output unallocated. If an IPv4 route later references this
> nexthop, __mkroute_output() dereferences NULL nhc_pcpu_rth_output and
> panics.
>
> The reject classification was designed for regular IPv6 routes to prevent
> kernel loopback loops, but nexthop objects should not be subject to this
> check since they carry no destination information - loop prevention is
> handled separately when the route is created.
>
> An alternative approach of unconditionally calling fib_nh_common_init()
> for all reject routes was considered, but on large machines (e.g., 256
> CPUs) with many routes, this wastes significant memory since
> nhc_pcpu_rth_output allocates a per-CPU pointer for each route.
>
> Since fib6_nh_init() is shared by multiple callers (route creation,
> nexthop object creation, IPv4 gateway validation), using fc_dst_len to
> implicitly distinguish nexthop objects would be fragile. Add an explicit
> fc_is_nh flag to fib6_config to clearly identify nexthop object creation
> and skip the reject check for this path.
>
> Fixes: 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh")
> Reported-by: [email protected]
> Closes:
> https://lore.kernel.org/all/[email protected]/T/
> Signed-off-by: Jiayuan Chen <[email protected]>
> ---
> include/net/ip6_fib.h | 1 +
> net/ipv4/nexthop.c | 1 +
> net/ipv6/route.c | 8 +++++++-
> 3 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
> index 88b0dd4d8e09..7710f247b8d9 100644
> --- a/include/net/ip6_fib.h
> +++ b/include/net/ip6_fib.h
> @@ -62,6 +62,7 @@ struct fib6_config {
> struct nlattr *fc_encap;
> u16 fc_encap_type;
> bool fc_is_fdb;
> + bool fc_is_nh;
> };
>
> struct fib6_node {
> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> index 7b9d70f9b31c..efad2dd27636 100644
> --- a/net/ipv4/nexthop.c
> +++ b/net/ipv4/nexthop.c
> @@ -2859,6 +2859,7 @@ static int nh_create_ipv6(struct net *net, struct
> nexthop *nh,
> struct fib6_config fib6_cfg = {
> .fc_table = l3mdev_fib_table(cfg->dev),
> .fc_ifindex = cfg->nh_ifindex,
> + .fc_is_nh = true,
> .fc_gateway = cfg->gw.ipv6,
> .fc_flags = cfg->nh_flags,
> .fc_nlinfo = cfg->nlinfo,
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index c0350d97307e..347f464ce7fe 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -3628,7 +3628,13 @@ int fib6_nh_init(struct net *net, struct fib6_nh
> *fib6_nh,
> * they would result in kernel looping; promote them to reject routes
> */
> addr_type = ipv6_addr_type(&cfg->fc_dst);
> - if (fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
> + /*
> + * Nexthop objects have no destination prefix, so fib6_is_reject()
> + * will misclassify loopback nexthops as reject routes, causing
> + * fib_nh_common_init() to be skipped along with its allocation
> + * of nhc_pcpu_rth_output, which IPv4 routes require.
> + */
> + if (!cfg->fc_is_nh && fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
> /* hold loopback dev/idev if we haven't done so. */
> if (dev != net->loopback_dev) {
> if (dev) {
The code basically resets the nexthop device to the loopback device in
case of reject routes:
# ip link add name dummy1 up type dummy
# ip route add unreachable 2001:db8:1::/64 dev dummy1
# ip -6 route show 2001:db8:1::/64
unreachable 2001:db8:1::/64 dev lo metric 1024 pref medium
Therefore, the check in fib6_is_reject() regarding the nexthop device
being a loopback seems quite pointless. It's probably only needed when
promoting routes that are using the loopback device to reject routes,
which happens in ip6_route_info_create_nh() (the other caller of
fib6_is_reject()).
I suggest simplifying the check so that it only applies to reject routes
[1]. It fixes the issue since RTF_REJECT is a route attribute and not a
nexthop attribute, so it will never be set by the nexthop code.
[1]
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 85df25c36409..035e3f668d49 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3582,7 +3582,6 @@ int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh,
netdevice_tracker *dev_tracker = &fib6_nh->fib_nh_dev_tracker;
struct net_device *dev = NULL;
struct inet6_dev *idev = NULL;
- int addr_type;
int err;
fib6_nh->fib_nh_family = AF_INET6;
@@ -3624,11 +3623,10 @@ int fib6_nh_init(struct net *net, struct fib6_nh
*fib6_nh,
fib6_nh->fib_nh_weight = 1;
- /* We cannot add true routes via loopback here,
- * they would result in kernel looping; promote them to reject routes
+ /* Reset the nexthop device to the loopback device in case of reject
+ * routes.
*/
- addr_type = ipv6_addr_type(&cfg->fc_dst);
- if (fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
+ if (cfg->fc_flags & RTF_REJECT) {
/* hold loopback dev/idev if we haven't done so. */
if (dev != net->loopback_dev) {
if (dev) {