> 
> On Sun, Oct 18, 2015 at 1:07 AM, Florian Westphal <f...@strlen.de> wrote:
> > Ani Sinha <a...@arista.com> wrote:
> >> Coming back to this crash, I see something interesting in the
> >> conntrack code in linux 3.4.109 (a supported kernel version). I see
> >> that the hash table manipulations are protected by a spinlock. Also
> >> lookups/reads are protected by RCU. However allocation and
> >> deallocation of conntrack objects happen outside of both the locks.
> >> It seems to me that a conntrack object can be deallocated and a new
> >> object can be allocated and initialized within the same RCU grace
> >> period, while the hash table is being read.
> >
> > Yes.  We need to use SLAB_DESTROY_BY_RCU instead of kfree_rcu because
> > there could be hundreds of thousands of alloc/free pairs within a short
> > time period.
> >
> >> It looks like a bug to me.
> >
> > No, as long as readers detect object reuse.
 
Right.
 
> >
> >> > Looking upstream, I see a couple of patches which fixes race condition
> >> > around the use of the conntrack hash table with RCU (lock free read)
> >> > primitives :
> >> >
> >> > commit c6825c0976fa7893692e0e43b09740b419b23c09
> >> > Author: Andrey Vagin <ava...@openvz.org>
> >> > Date:   Wed Jan 29 19:34:14 2014 +0100
> >> >      netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
> >> >
> >> > and a followup patch :
> >> >
> >> > commit e53376bef2cd97d3e3f61fdc677fb8da7d03d0da
> >> > Author: Pablo Neira Ayuso <pa...@netfilter.org>
> >> > Date:   Mon Feb 3 20:01:53 2014 +0100
> >> >         netfilter: nf_conntrack: don't release a conntrack with non-zero 
> >> > refcnt
> >> >
> >
> > These for instance fix such bugs.
> 
Indeed. So it seems to me that we have run into one another such case.
In patch c6825c0976fa7893692, I see we have added an additional check (along 
with comparing tuple and zone) to verify that if the conntrack is confirmed.
 
+       return nf_ct_tuple_equal(tuple, &h->tuple) &&
+               nf_ct_zone(ct) == zone &&
+               nf_ct_is_confirmed(ct);
 
 
This is necessary since it's possible that a conntrack can be recreated with 
the same zone.
Unfortunately, we leave a hole open in __nf_conntrack_confirm() because this 
routine _is_ responsible
for confirming the conntrack. We cannot use the same logic here. 
 
Should I send a patch along the lines of :
 
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 71935fc..6ff4088 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -535,6 +535,12 @@ __nf_conntrack_confirm(struct sk_buff *skb)
                    zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))
                        goto out;
 
+       /* we might be racing against a case where the conntrack was deleted 
+          and a new conntrack was initialized with the exact same zone. We
+          need to make sure that the conntrack node is in the hashtable */
+       if (hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode))
+         goto out;
+
        /* Remove from unconfirmed list */
        hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to