Hello Sebastien,
> > thank you very much for doing the test for me.
> > patch below should kill the bug.
> >
> > patch applies to current.
>
> I confirm that since 4 days the host is stable without any issues.
>
> Your diff is ok semarie@
>
thank you for OK. unfortunately one more tweak is needed. The change
you were testing allows race, where we leak a reference, when
two packets are trying to bind state keys with each other at
the same time.
I've found the leak on test box provided by Hrvoje, where I'm
able to put the code under the stress.
this got fixed by diff below. I believe patch below should keep
your box happy and reference leak plugged. I would like to
ask you if you can put it in and give it a try.
thanks and
regards
sashan
--------8<---------------8<---------------8<------------------8<--------
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 23eebf4a274..bcd81a83300 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -7373,14 +7373,21 @@ pf_state_key_link_reverse(struct pf_state_key *sk,
struct pf_state_key *skrev)
old_reverse = atomic_cas_ptr(&sk->reverse, NULL, skrev);
if (old_reverse != NULL)
KASSERT(old_reverse == skrev);
- else
+ else {
pf_state_key_ref(skrev);
- old_reverse = atomic_cas_ptr(&skrev->reverse, NULL, sk);
- if (old_reverse != NULL)
- KASSERT(old_reverse == sk);
- else
+ /*
+ * NOTE: if sk == skrev, then KASSERT() below holds true, we
+ * still want to grab a reference in such case, because
+ * pf_state_key_unlink_reverse() does not check whether keys
+ * are identical or not.
+ */
+ old_reverse = atomic_cas_ptr(&skrev->reverse, NULL, sk);
+ if (old_reverse != NULL)
+ KASSERT(old_reverse == sk);
+
pf_state_key_ref(sk);
+ }
}
#if NPFLOG > 0