Hello,
</snip>
> >How-To-Repeat:
> # echo "set timeout interval 1" >> /etc/pf.conf
> # pfctl -f /etc/pf.conf
>
> <wait a few seconds>
>
> >Fix:
> Not known.
I've used same steps to reproduce the panic. This is what I could
see in resulting crash:
The system died at line 1441 in pf_free_state():
1415 void
1416 pf_free_state(struct pf_state *cur)
1417 {
1418 struct pf_rule_item *ri;
1419
1420 PF_ASSERT_LOCKED();
....
1440 pf_normalize_tcp_cleanup(cur);
1441 pfi_kif_unref(cur->kif, PFI_KIF_REF_STATE);
1442 TAILQ_REMOVE(&state_list, cur, entry_list);
1443 if (cur->tag)
1444 pf_tag_unref(cur->tag);
1445 pf_state_unref(cur);
the instruction, where we died was as follows:
Stopped at pf_free_state+0xfe: movq %rcx,0x28(%rax)
there was -1 in rax. the -1 at offset 0x28 indicates the `cur` has been
removed from state_list. To explain how is it possible, we need to step
back to pf_free_state() caller, which is pf_purge_expired_states():
1453 static struct pf_state *cur = NULL;
1454 struct pf_state *next;
1455 SLIST_HEAD(pf_state_gcl, pf_state) gcl;
1456
1457 PF_ASSERT_UNLOCKED();
1458 SLIST_INIT(&gcl);
1459
1460 PF_STATE_ENTER_READ();
1461 while (maxcheck--) {
1462 /* wrap to start of list when we hit the end */
1463 if (cur == NULL) {
1464 cur = pf_state_ref(TAILQ_FIRST(&state_list));
1465 if (cur == NULL)
1466 break; /* list empty */
1467 }
1468
1469 /* get next state, as cur may get deleted */
1470 next = TAILQ_NEXT(cur, entry_list);
1471
1472 if ((cur->timeout == PFTM_UNLINKED) ||
1473 (pf_state_expires(cur) <= time_uptime))
1474 SLIST_INSERT_HEAD(&gcl, cur, gc_list);
1475 else
1476 pf_state_unref(cur);
1477
1478 cur = pf_state_ref(next);
1479 }
I think the problem is the first 'while (maxcheck--)' loop may actually
wrap around the state_list and re-insert the `cur` to garbage collector list
again. Such sequence of events would match the panic I could see. I think
the right fix is to break from the while loop as soon, as we reach
the end of the state_list.
I've also noticed yet another minor problem there in loop, which
processes the garbage collector list:
1487 while ((next = SLIST_FIRST(&gcl)) != NULL) {
1488 SLIST_REMOVE_HEAD(&gcl, gc_list);
1489 if (next->timeout == PFTM_UNLINKED)
1490 pf_free_state(next);
1491 else if (pf_state_expires(next) <= time_uptime)) {
1492 pf_remove_state(next);
1493 pf_free_state(next);
1494 }
1495
1496 pf_state_unref(next);
1497 }
at line 1491, we don't need to recalculate expiration time. The `next` item is
on the garbage collector list already, so it must be expired for sure.
patch below fixes both glitches. I wonder if it will work for you as well.
thanks for testing
and sorry for inconveniences
regards
sashan
--------8<---------------8<---------------8<------------------8<--------
diff --git a/sys/net/pf.c b/sys/net/pf.c
index bc33fa723ea..8cadd1f53c2 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -1476,6 +1476,9 @@ pf_purge_expired_states(u_int32_t maxcheck)
pf_state_unref(cur);
cur = pf_state_ref(next);
+
+ if (cur == NULL)
+ break;
}
PF_STATE_EXIT_READ();
@@ -1485,7 +1488,7 @@ pf_purge_expired_states(u_int32_t maxcheck)
SLIST_REMOVE_HEAD(&gcl, gc_list);
if (next->timeout == PFTM_UNLINKED)
pf_free_state(next);
- else if (pf_state_expires(next) <= time_uptime) {
+ else {
pf_remove_state(next);
pf_free_state(next);
}