---------- Forwarded message ---------
보낸사람: Jinho Ju <wnwlsg...@gmail.com>
Date: 2023년 12월 21일 (목) 오후 2:52
Subject: Re: Fwd: memory leak in batadv_iv_ogm_aggregate_new
To: Sven Eckelmann <s...@narfation.org>, <b.a.t.m.a.n@lists.open-mesh.org>
Cc: <mareklind...@neomailbox.ch>, <s...@simonwunderlich.de>, <a...@unstable.cc>


Resending to everyone on the mailing list as per previous mail, adding
some things that were missing.

Regarding the cause of the L2-related crash being detected by
syzkaller, I can't say for sure - what I can say for sure at this
point is that a memleak occurring in L2 was detected by my personal
syzkaller.

Moving away from syzkaller for a moment and shifting the focus to
memleak, we have to assume that the conditions for this to occur are
that they reference the same network stack and are found in modules in
L2, but it seems that when batman-adv is freed and returned while
accessing and processing a skb in veth (L3), memleak occurs because it
is trying to reference the same skb, the veth freed skb.

I'm keeping an eye on 'static netdev_tx_t veth_xmit()' as a related
function, but for now, the above flow seems to be the most obvious as
the root cause.

Jinho Ju,
Thanks.

2023년 12월 19일 (화) 오후 7:09, Sven Eckelmann <s...@narfation.org>님이 작성:
>
> On Tuesday, 19 December 2023 08:30:47 CET Jinho Ju wrote:
> > ---------- Forwarded message ---------
> > 보낸사람: Jinho Ju <wnwlsg...@gmail.com>
> > Date: 2023년 12월 19일 (화) 오후 1:58
> > Subject: memory leak in batadv_iv_ogm_aggregate_new
> > To: <secur...@kernel.org>
> >
> >
> > Hello, I am "Jinho Ju" who is studying about Kernel security in Korea.
> > A "*memory leak in batadv_iv_ogm_aggregate_new*" was reported in Syzkaller
> > targeting 6.7-rc6 on December 19, 2023 at 02:03.
> > The environment in which this bug was detected is as follows.
> > Syzkaller version: 3222d10c
> > Kernel version: LInux kernel 6.7-rc6
> > The report provided by Syzkaller is as follows.
>
>
> Thanks. But why isn't this coordinated through the "normal" syzkaller
> instance? [1]
>
> Also when looking at these backtraces, I am not sure if we are the correct
> recipients - but please correct me. Take as easy example
> batadv_iv_ogm_send_to_if. This function does a clone and immediately sends it
> via batadv_send_broadcast_skb. At the end, it goes through
> batadv_send_skb_packet - a function which either does a kfree_skb or a
> dev_queue_xmit. A function (__dev_queue_xmit) which has in its description:
>
>     Regardless of the return value, the skb is consumed
>
> So I would assume that something which consumes packets from this queue (so
> the sb_dev) is not actually doing its job correctly and leaking frames. So in
> my opinion, it is necessary to figure out what tried to handle the skb after
> it left batman-adv. Which would involve information like the underlying
> interfaces.
>
> If I read the reproducer correctly, veth pairs are used as underlying 
> interfaces.
>
> But the setup is so convoluted with vlan, macvlan, hwsim, xfrm, macvtap, ...
> I don't see a l2 link between these other interfaces (only L3) but I could be
> wrong. So it would be necessary to reduce this complexity heavily to figure
> out what is not cleaning up the supplied skbuff.
>
> I should most likely study the reproducer more but my current assumption is
> that you would end up with backtraces that look like veth is leaking skbs when
> you modify veth.c like this:
>
>     diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>     index 977861c46b1f..1d86e3869c77 100644
>     --- a/drivers/net/veth.c
>     +++ b/drivers/net/veth.c
>     @@ -344,12 +344,22 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>      {
>         struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
>         struct veth_rq *rq = NULL;
>     +   struct sk_buff *hack_skb;
>         int ret = NETDEV_TX_OK;
>         struct net_device *rcv;
>         int length = skb->len;
>         bool use_napi = false;
>         int rxq;
>
>     +   hack_skb = skb_clone(skb, GFP_ATOMIC);
>     +   if (!hack_skb) {
>     +           kfree_skb(skb);
>     +           return NET_XMIT_DROP;
>     +   }
>     +
>     +   consume_skb(skb);
>     +   skb = hack_skb;
>     +
>         rcu_read_lock();
>         rcv = rcu_dereference(priv->peer);
>         if (unlikely(!rcv) || !pskb_may_pull(skb, ETH_HLEN)) {
>
> But there is also a chance that actually net/dev/core.c is leaking it and it
> never reaches the veth driver.
>
>
> I also don't get why we were contacted in private and why the kernel security
> list was involved in the first place.
>
> Kind regards,
>         Sven
>

Reply via email to