On Tue, 05 May 2026 01:30:14 +0900
Yuya Kusakabe <[email protected]> wrote:

Hi Yuya,

I do not repeat below the points from my cover letter and patch 1-3 replies
(drop reasons, OIF/VRF removal, C helper, coding style, etc.).

> Add the End.M.GTP6.D headend behavior (RFC 9433 Section 6.3), which
> receives an IPv6/UDP/GTP-U packet matching a locally instantiated
> End.M.GTP6.D SID and re-encapsulates the inner T-PDU in SRv6 using
> the configured SR Policy.  TEID and QFI are folded into the 40-bit
> Args.Mob.Session field defined by RFC 9433 Section 6.1.
>
> RFC 9433 Section 6.3 Step S08 specifies "Write in the SRH[0] the
> Args.Mob.Session" for a single-SID SR Policy.  When the SR Policy
> contains more segments, the augmented SRH must reserve a leading
> slot for the original outer destination D so that the downstream
> End.M.GTP6.E (which Section 6.5 requires to sit at the penultimate
> SID and Step S01 instructs to "Copy SRH[0] and D to buffer memory")
> can rebuild the GTP-U tunnel.  Args.Mob.Session is therefore stamped
> into segments[1] (the End.M.GTP6.E SID's locator-relative tail).
>
> The augmented SRH (slwt->srh + one extra leading slot) is built
> once at build_state time and reused on every packet.
>
> The new SEG6_LOCAL_MOBILE_SR_PREFIX_LEN attribute carries the
> locator length used by the remote End.M.GTP6.E SID; it is required
> because the SR Gateway has no way to discover the remote SID's
> prefix length from the FIB on its own.
>
> When net.netfilter.nf_hooks_lwtunnel=1, the inner T-PDU traverses
> NF_INET_PRE_ROUTING between the GTP-U strip and the SRv6 push,
> mirroring End.DX4 / End.DX6.
>
> Inbound GTP-U packets are classified by message type (3GPP TS
> 29.281 Section 5.1).  Only T-PDU (type 255) is encapsulated into
> SRv6.  Any other GTP-U message (Echo Request/Response, Error
> Indication, ...) is forwarded unchanged via the lwtunnel's saved
> orig_input so that a downstream peer that owns the GTP-U control
> plane can process it.
>
> Configuration:
>
>   ip -6 route add 2001:db8:f::/64 \
>       encap seg6local action End.M.GTP6.D \
>           srh segs 2001:db8:2::e \
>           src 2001:db8:2::1 \

The "src" attribute is used verbatim here as the outer IPv6 source address,
same as patch 3. The src dual-semantics overload flagged in the patch 3
reply applies here too.

>           sr_prefix_len 64 \
>       dev <dev>
>

Thank you for the follow-up in the cover letter thread. The finish callback
writes orig_dst into SRH[0] and Args.Mob.Session into SRH[1]. As far as I
can see, this matches neither Section 6.3 (Args.Mob.Session in SRH[0], no
D) nor Section 6.4 (D in SRH[0], no Args.Mob).

The comments below apply regardless of which section the behavior
ends up implementing.

> Link: https://www.rfc-editor.org/rfc/rfc9433.html#section-6.3
> Signed-off-by: Yuya Kusakabe <[email protected]>
> ---
>  include/uapi/linux/seg6_local.h                    |   3 +
>  net/ipv6/seg6_local.c                              | 512 
> +++++++++++++++++++++
>  tools/testing/selftests/net/Makefile               |   1 +
>  .../selftests/net/srv6_end_m_gtp6_d_test.sh        | 497 ++++++++++++++++++++
>  4 files changed, 1013 insertions(+)
>
> diff --git a/include/uapi/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
> index 8e46ede2980d..7d3d3d245b47 100644
> --- a/include/uapi/linux/seg6_local.h
> +++ b/include/uapi/linux/seg6_local.h
> @@ -33,6 +33,7 @@ enum {
>       SEG6_LOCAL_MOBILE_V4_MASK_LEN,
>       SEG6_LOCAL_MOBILE_PDU_TYPE,
>       SEG6_LOCAL_MOBILE_V6_SRC_PREFIX_LEN,
> +     SEG6_LOCAL_MOBILE_SR_PREFIX_LEN,
>       __SEG6_LOCAL_MAX,
>  };
>  #define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
> @@ -77,6 +78,8 @@ enum {
>       SEG6_LOCAL_ACTION_END_M_GTP4_E  = 18,
>       /* SRv6 to IPv6/GTP-U encap (RFC 9433 Section 6.5) */
>       SEG6_LOCAL_ACTION_END_M_GTP6_E  = 19,
> +     /* IPv6/GTP-U decap into SRv6 (RFC 9433 Section 6.3) */
> +     SEG6_LOCAL_ACTION_END_M_GTP6_D  = 20,
>
>       __SEG6_LOCAL_ACTION_MAX,
>  };
> diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
> index 4e5d138c3657..09e912e17df8 100644
> --- a/net/ipv6/seg6_local.c
> +++ b/net/ipv6/seg6_local.c

> + [snip]
>
> +/* Parse the GTP-U header at @skb offset @gtp_off.  Pulls each
> + * additional region (long header, extension chain) into the linear
> + * area as it walks; on success returns the total header length to
> + * consume (mandatory + optional + extension headers), or a negative
> + * errno on failure.
> + *
> + * Returns -EOPNOTSUPP if the packet is a well-formed GTPv1-U header
> + * that this code path does not consume itself (any non-T-PDU message
> + * such as Echo Request / Error Indication).  Callers pass such packets
> + * through to the configured forwarding path via
> + * seg6_mobile_passthrough_non_tpdu().
> + *
> + * Returns -EINVAL when the GTP-U header is structurally malformed
> + * (truncated extension chain, ext_units == 0, etc.).  Callers should
> + * drop those.
> + *
> + * On success, *@teid is set to the GTP-U TEID and *@qfi is set to the
> + * QFI found in a PDU Session extension header, or 0 if none is present.
> + *
> + * Callers must re-derive any pointers into @skb->data after this
> + * function returns: pskb_may_pull() may have reallocated skb->head.
> + */
> +static int seg6_mobile_parse_gtpu(struct sk_buff *skb, unsigned int gtp_off,
> +                               u32 *teid, u8 *qfi)
> +{
> +     const struct gtp1_header *gtph;
> +     const struct gtp1_header_long *gtphl;
> +     const u8 *gtp;
> +     unsigned int hdrlen;
> +     u8 flags, next;

Same reverse Christmas tree as patch 2; same issue in the other functions
introduced by this patch.

gtp is only used as a cast intermediary. Could it be inlined?

> +
> +     if (!pskb_may_pull(skb, gtp_off + sizeof(*gtph)))
> +             return -EINVAL;
> +     gtp = skb->data + gtp_off;
> +     gtph = (const struct gtp1_header *)gtp;
> +     flags = gtph->flags;
> +
> +     /* Accept only GTPv1-U T-PDU (3GPP TS 29.281 Section 5.1).  Other
> +      * GTPv1-U message types (Echo Request/Response, Error Indication,
> +      * ...) are dispatched separately by the caller.
> +      */
> +     if ((flags & ~GTP1_F_MASK) != SEG6_MOBILE_GTP1U_FLAGS_BASE)
> +             return -EOPNOTSUPP;
> +     if (gtph->type != GTP_TPDU)
> +             return -EOPNOTSUPP;
> +
> +     *teid = ntohl(gtph->tid);
> +     *qfi = 0;
> +
> +     if (!(flags & (GTP1_F_EXTHDR | GTP1_F_SEQ | GTP1_F_NPDU)))
> +             return sizeof(*gtph);
> +
> +     if (!pskb_may_pull(skb, gtp_off + sizeof(*gtphl)))
> +             return -EINVAL;
> +     gtp = skb->data + gtp_off;
> +     gtphl = (const struct gtp1_header_long *)gtp;
> +     hdrlen = sizeof(*gtphl);
> +
> +     if (!(flags & GTP1_F_EXTHDR))
> +             return hdrlen;

Nit: gtphl and hdrlen are assigned before the GTP1_F_EXTHDR check. On the
path where the E flag is not set, gtphl is unused. Moving the gtphl
assignment after the check would make the flow clearer.

> +
> +     next = gtphl->next;
> +     while (next != 0) {
> +             unsigned int ext_units, ext_bytes;
> +             const u8 *ext;

Maybe ext could be renamed to ext_hdr? It would be easier to distinguish
from ext_units and ext_bytes.

> +
> +             if (!pskb_may_pull(skb, gtp_off + hdrlen + 1))
> +                     return -EINVAL;
> +             ext = skb->data + gtp_off + hdrlen;
> +             ext_units = ext[0];
> +             if (ext_units == 0)
> +                     return -EINVAL;
> +
> +             ext_bytes = ext_units * 4;
> +             if (!pskb_may_pull(skb, gtp_off + hdrlen + ext_bytes))
> +                     return -EINVAL;
> +             ext = skb->data + gtp_off + hdrlen;

ext_units is only used to derive ext_bytes. A single ext_len would
remove the intermediate variable.

> +
> +             if (next == SEG6_MOBILE_PDU_SESSION_NH) {
> +                     /* 3GPP TS 38.415: the PDU Session extension header
> +                      * is exactly 4 bytes long.
> +                      */
> +                     if (ext_bytes != 4)
> +                             return -EINVAL;
> +                     *qfi = ext[2] & SEG6_MOBILE_PDU_SESSION_QFI_MASK;
> +             }

If the extension chain contains more than one PDU Session Container, *qfi
is silently overwritten. Is that intentional, or should the function reject
a duplicate?

> +
> +             next = ext[ext_bytes - 1];

ext[ext_bytes - 1] reads the Next Extension Header Type field from the last
byte of the current extension. Would a short comment help the reader?

> +             hdrlen += ext_bytes;
> +     }
> +
> +     return hdrlen;
> +}

> + [snip]

> +static int seg6_mobile_passthrough_non_tpdu(struct sk_buff *skb)
> +{
> +     struct dst_entry *dst = skb_dst(skb);
> +
> +     if (dst && dst->lwtstate && dst->lwtstate->orig_input)
> +             return dst->lwtstate->orig_input(skb);
> +
> +     kfree_skb_reason(skb, SKB_DROP_REASON_SEG6_MOBILE_BAD_GTPU);

input_action_end_m_gtp6_d() does not change skb_dst(skb) before this call,
so dst and lwtstate are the same ones the caller already dereferenced. When
can this NULL check trigger?

> +     return -EINVAL;
> +}

> + [snip]

> +static int input_action_end_m_gtp6_d_finish(struct net *net,
> +                                         struct sock *sk,
> +                                         struct sk_buff *skb)
> +{
> +     struct seg6_mobile_gtp6_d_cb cb = *SEG6_MOBILE_GTP6_D_CB(skb);
> +     struct dst_entry *orig_dst = skb_dst(skb);
> +     enum skb_drop_reason reason;
> +     const struct seg6_mobile_info *minfo;
> +     struct seg6_local_lwt *slwt;
> +     struct ipv6_sr_hdr *new_srh;
> +     int inner_proto;
> +     int err;
> +
> +     slwt = seg6_local_lwtunnel(orig_dst->lwtstate);
> +     minfo = &slwt->mobile_info;

Same dst/lwtstate issue as patch 2. Not introduced by this patch.

> +
> +     inner_proto = (skb->protocol == htons(ETH_P_IP)) ? IPPROTO_IPIP
> +                                                      : IPPROTO_IPV6;
> +
> +     err = seg6_do_srh_encap(skb, minfo->aug_srh, inner_proto);

Same missing iptunnel_handle_offloads() as patch 2.

> +     if (err) {
> +             reason = (err == -ENOMEM) ? SKB_DROP_REASON_SEG6_MOBILE_NOMEM
> +                                       : 
> SKB_DROP_REASON_SEG6_MOBILE_BAD_INNER;

Same BAD_INNER misuse as patch 2. seg6_do_srh_encap() can also fail from
seg6_push_hmac(), which is an HMAC error on the new SRH, not an inner-T-PDU
problem.

> +             goto drop;
> +     }
> +
> +     skb->protocol = htons(ETH_P_IPV6);
> +
> +     new_srh = (struct ipv6_sr_hdr *)(skb_network_header(skb) +
> +                                      sizeof(struct ipv6hdr));
> +     new_srh->segments[0] = cb.orig_dst;
> +     if (seg6_mobile_write_args_mob(&new_srh->segments[1],
> +                                    minfo->sr_prefix_len, cb.args_mob)) {
> +             reason = SKB_DROP_REASON_SEG6_MOBILE_BAD_SID;
> +             goto drop;
> +     }
> +
> +     ipv6_hdr(skb)->saddr = minfo->src_addr;
> +
> +     /* seg6_do_srh_encap() copied segments[first_segment] to the outer
> +      * DA before Args.Mob.Session was stamped; refresh it.
> +      */
> +     ipv6_hdr(skb)->daddr = new_srh->segments[new_srh->first_segment];

As noted at the top of this reply, segments[0] = orig_dst and
segments[1] = Args.Mob.Session matches neither Section 6.3 nor Section 6.4.

segments[0], segments[1], saddr, and daddr are written after
seg6_do_srh_encap() already called skb_postpush_rcsum(). skb->csum can
be stale. Same for any later change to the outer header or SRH.

HMAC, if configured, is computed on non-final SRH and saddr, hence invalid.

> +
> +     skb_set_transport_header(skb, sizeof(struct ipv6hdr));
> +     nf_reset_ct(skb);
> +     skb_dst_drop(skb);
> +
> +     seg6_lookup_any_nexthop(skb, NULL, 0, false, slwt->oif);
> +     return dst_input(skb);
> +
> +drop:
> +     kfree_skb_reason(skb, reason);
> +     return -EINVAL;
> +}

Same redundant skb_dst_drop() as patch 3.

> +
> +/* RFC 9433 Section 6.3 -- End.M.GTP6.D
> + * Receives an IPv6/UDP/GTP-U packet matching a locally instantiated
> + * End.M.GTP6.D SID and re-encapsulates the inner T-PDU in SRv6 using
> + * the configured SR Policy.  TEID and QFI are folded into
> + * Args.Mob.Session.  Per RFC 9433 Section 6.5 ("End.M.GTP6.E SID MUST
> + * always be the penultimate SID"), Args.Mob.Session is encoded into
> + * segments[1] of the new SRH (the penultimate SID at the egress UPF)
> + * while segments[0] holds the original outer DA so that the egress
> + * has a real GTP-U destination after End.M.GTP6.E decap.
> + *
> + * When net.netfilter.nf_hooks_lwtunnel=1 the inner T-PDU is exposed
> + * to NF_INET_PRE_ROUTING after the GTP-U strip and before the SRv6
> + * push, mirroring End.DX4 / End.DX6.  This lets nftables / conntrack
> + * apply policy on the inner 5-tuple at the SR Gateway.
> + */
> +static int input_action_end_m_gtp6_d(struct sk_buff *skb,
> +                                  struct seg6_local_lwt *slwt)
> +{
> +     unsigned int outer_len, inner_off;
> +     int gtp_hdrlen, inner_proto, inner_nfproto;
> +     struct in6_addr orig_dst;
> +     u8 inner_first, qfi;
> +     struct ipv6_sr_hdr *srh;
> +     struct ipv6hdr *ip6h;
> +     struct udphdr *uh;
> +     u64 args_mob;
> +     u32 teid;
> +     enum skb_drop_reason reason = SKB_DROP_REASON_SEG6_MOBILE_BAD_GTPU;

The initializer on reason is dead. Every goto drop path sets reason
explicitly before the jump. The variable can be left uninitialized here.

> +
> +     BUILD_BUG_ON(sizeof(struct seg6_mobile_gtp6_d_cb) >
> +                  sizeof_field(struct sk_buff, cb));
> +
> +     /* RFC 9433 Section 6.3 SRH-S01: drop if outer SRH carries
> +      * SegmentsLeft != 0
> +      */
> +     srh = seg6_get_srh(skb, 0);
> +     if (srh && srh->segments_left != 0) {
> +             reason = SKB_DROP_REASON_SEG6_MOBILE_INVALID_SRH_SL;
> +             goto drop;
> +     }

Same SRH validation concerns as patch 1. HMAC is not validated here.

> +
> +     if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) {
> +             reason = SKB_DROP_REASON_SEG6_MOBILE_BAD_INNER;
> +             goto drop;
> +     }

Same BAD_INNER misuse as patch 2: this is pulling the outer IPv6 header,
not the inner T-PDU.

> +
> +     ip6h = ipv6_hdr(skb);
> +     orig_dst = ip6h->daddr;
> +
> +     /* RFC 9433 Section 6.3 upper-layer S01-S11: dispatch on
> +      * (NH == UDP && UDP dport == GTP-U); otherwise delegate to the
> +      * regular End behaviour (S10-S11).
> +      */
> +     {

The anonymous { } block scopes three variables that should be declared at
function top. Splitting into smaller helpers would make this easier to
follow.

> +             __be16 frag_off;
> +             u8 nh = ip6h->nexthdr;
> +             int upper_off;
> +
> +             upper_off = ipv6_skip_exthdr(skb, sizeof(*ip6h), &nh,
> +                                          &frag_off);
> +             if (upper_off < 0) {
> +                     /* Outer IPv6 ext-header walk failed; the GTP-U
> +                      * envelope below it is unreachable.
> +                      */
> +                     reason = SKB_DROP_REASON_SEG6_MOBILE_BAD_GTPU;
> +                     goto drop;
> +             }

Same missing frag_off check as patch 2.

> +
> +             if (nh != IPPROTO_UDP)
> +                     return input_action_end(skb, slwt);
> +
> +             if (!pskb_may_pull(skb, upper_off + sizeof(*uh))) {
> +                     reason = SKB_DROP_REASON_SEG6_MOBILE_BAD_GTPU;
> +                     goto drop;
> +             }
> +
> +             ip6h = ipv6_hdr(skb);
> +             uh = (struct udphdr *)((u8 *)ip6h + upper_off);
> +             if (uh->dest != htons(GTP1U_PORT))
> +                     return input_action_end(skb, slwt);

Limitation note for both input_action_end() calls above: correct per RFC
9433 Section 6.3 S10-S11, but the SRH is absent or SL == 0 here, so
input_action_end() will always drop without signaling non-GTP-U traffic.
Perhaps you meant to drop directly with BAD_GTPU?

> +
> +             gtp_hdrlen = seg6_mobile_parse_gtpu(skb,
> +                                                 upper_off + sizeof(*uh),
> +                                                 &teid, &qfi);
> +             if (gtp_hdrlen == -EOPNOTSUPP)
> +                     return seg6_mobile_passthrough_non_tpdu(skb);
> +             if (gtp_hdrlen < 0) {
> +                     reason = SKB_DROP_REASON_SEG6_MOBILE_BAD_GTPU;
> +                     goto drop;
> +             }
> +
> +             outer_len = upper_off + sizeof(*uh) + gtp_hdrlen;
> +     }
> +
> +     args_mob = seg6_mobile_args_from_teid_qfi(teid, qfi);
> +
> +     if (!pskb_may_pull(skb, outer_len + 1)) {
> +             reason = SKB_DROP_REASON_SEG6_MOBILE_BAD_INNER;
> +             goto drop;
> +     }
> +
> +     inner_off = outer_len;
> +     inner_first = *((u8 *)skb->data + inner_off);
> +     switch (inner_first >> 4) {

Nit: inner_first could be an inner_ver with the shift done at assignment.
The name would say what the variable holds.

> +     case 4:
> +             inner_proto = IPPROTO_IPIP;
> +             inner_nfproto = NFPROTO_IPV4;
> +             break;
> +     case 6:
> +             inner_proto = IPPROTO_IPV6;
> +             inner_nfproto = NFPROTO_IPV6;
> +             break;
> +     default:
> +             reason = SKB_DROP_REASON_SEG6_MOBILE_BAD_INNER;
> +             goto drop;
> +     }
> +
> +     if (!pskb_may_pull(skb, outer_len +
> +                        ((inner_proto == IPPROTO_IPIP) ?
> +                         sizeof(struct iphdr) : sizeof(struct ipv6hdr)))) {
> +             reason = SKB_DROP_REASON_SEG6_MOBILE_BAD_INNER;
> +             goto drop;
> +     }
> +
> +     skb_pull_rcsum(skb, outer_len);
> +     skb_reset_network_header(skb);
> +
> +     /* Set skb->protocol to match the inner header so that the
> +      * NF_INET_PRE_ROUTING hook (and seg6_do_srh_encap() inside
> +      * the finish half) see a coherent IPv4/IPv6 packet.
> +      */
> +     skb->protocol = (inner_proto == IPPROTO_IPIP) ? htons(ETH_P_IP)
> +                                                   : htons(ETH_P_IPV6);
> +
> +     skb_set_transport_header(skb,
> +                              (inner_proto == IPPROTO_IPIP) ?
> +                              sizeof(struct iphdr) :
> +                              sizeof(struct ipv6hdr));

Same repeated size-selection ternary as patch 2.

> +     nf_reset_ct(skb);
> +
> +     SEG6_MOBILE_GTP6_D_CB(skb)->args_mob = args_mob;
> +     SEG6_MOBILE_GTP6_D_CB(skb)->orig_dst = orig_dst;
> +
> +     if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
> +             return NF_HOOK(inner_nfproto, NF_INET_PRE_ROUTING,
> +                            dev_net(skb->dev), NULL, skb, skb->dev,
> +                            NULL, input_action_end_m_gtp6_d_finish);
> +
> +     return input_action_end_m_gtp6_d_finish(dev_net(skb->dev), NULL, skb);
> +
> +drop:
> +     kfree_skb_reason(skb, reason);
> +     return -EINVAL;
> +}

> +
> +/* Shared between End.M.GTP6.D and End.M.GTP6.D.Di -- both
> + * prepend a single leading slot to the user-configured SRH to leave
> + * room for the original outer DA at SRH[0].  End.M.GTP6.D writes
> + * Args.Mob.Session into segments[1] at runtime; End.M.GTP6.D.Di
> + * leaves segments[1+] as the user provided them.
> + */
> +static int seg6_end_m_gtp6_d_aug_build(struct seg6_local_lwt *slwt,
> +                                    const void *cfg,
> +                                    struct netlink_ext_ack *extack)
> +{
> +     struct ipv6_sr_hdr *aug;
> +     int orig_len, aug_len;
> +
> +     if (!slwt->srh) {
> +             NL_SET_ERR_MSG_MOD(extack,
> +                                "End.M.GTP6.D{,.Di} requires srh segs");
> +             return -EINVAL;
> +     }

The "{,.Di}" shell brace notation is unusual. Emitting the actual
behavior name (End.M.GTP6.D or End.M.GTP6.D.Di) would be clearer.
Same applies wherever this notation appears in the patchset.

> +
> +     /* The augmented SRH adds one extra leading slot, so its hdrlen
> +      * field (u8) must still fit the +2-segment-equivalent encoding.
> +      * Reject pathological srh inputs at setup time so that no
> +      * silent overflow can produce an undersized aug->hdrlen and a
> +      * subsequent OOB read in seg6_do_srh_encap().
> +      */
> +     if (slwt->srh->hdrlen > 253) {
> +             NL_SET_ERR_MSG_MOD(extack,
> +                                "End.M.GTP6.D{,.Di} srh too large to augment 
> (max 126 segments)");
> +             return -EINVAL;
> +     }
> +
> +     orig_len = (slwt->srh->hdrlen + 1) << 3;
> +     aug_len = orig_len + sizeof(struct in6_addr);
> +
> +     aug = kzalloc(aug_len, GFP_KERNEL);
> +     if (!aug)
> +             return -ENOMEM;
> +
> +     memcpy(aug, slwt->srh, sizeof(*aug));
> +     aug->hdrlen = (aug_len >> 3) - 1;
> +     aug->segments_left = slwt->srh->segments_left + 1;
> +     aug->first_segment = slwt->srh->first_segment + 1;
> +     /* segments[0] left zero; data path stamps the original outer
> +      * DA into the in-skb copy after seg6_do_srh_encap().
> +      */
> +     memcpy(&aug->segments[1], &slwt->srh->segments[0],
> +            orig_len - sizeof(*aug));
> +
> +     slwt->mobile_info.aug_srh = aug;
> +     return 0;
> +}

> + [snip]

Thanks,

Ciao,
Andrea

P.S. I am temporarily writing from another address due to a mail
delivery issue at my @uniroma2.it address. Please always Cc my default
[email protected] address on replies.

Reply via email to