Re: pf.conf "set timeout interval 1" causes kernel crash

Alexandr Nedvedicky Tue, 20 Aug 2019 17:44:31 -0700

Hello,

</snip>


> >How-To-Repeat:
> # echo "set timeout interval 1" >> /etc/pf.conf
> # pfctl -f /etc/pf.conf
> 
>   <wait a few seconds>
> 
> >Fix:
>         Not known.

I've used same steps to reproduce the panic. This is what I could
see in resulting crash:

The system died at line 1441 in pf_free_state():

    1415 void
    1416 pf_free_state(struct pf_state *cur)
    1417 {
    1418         struct pf_rule_item *ri;
    1419 
    1420         PF_ASSERT_LOCKED();
    ....
    1440         pf_normalize_tcp_cleanup(cur);
    1441         pfi_kif_unref(cur->kif, PFI_KIF_REF_STATE);
    1442         TAILQ_REMOVE(&state_list, cur, entry_list);
    1443         if (cur->tag)
    1444                 pf_tag_unref(cur->tag);
    1445         pf_state_unref(cur);

the instruction, where we died was as follows:

    Stopped at      pf_free_state+0xfe:     movq    %rcx,0x28(%rax)

there was -1 in rax. the -1 at offset 0x28 indicates the `cur` has been
removed from state_list. To explain how is it possible, we need to step
back to pf_free_state() caller, which is pf_purge_expired_states():

    1453         static struct pf_state  *cur = NULL;
    1454         struct pf_state         *next;
    1455         SLIST_HEAD(pf_state_gcl, pf_state) gcl;
    1456 
    1457         PF_ASSERT_UNLOCKED();
    1458         SLIST_INIT(&gcl);
    1459 
    1460         PF_STATE_ENTER_READ();
    1461         while (maxcheck--) {
    1462                 /* wrap to start of list when we hit the end */
    1463                 if (cur == NULL) {
    1464                         cur = pf_state_ref(TAILQ_FIRST(&state_list));
    1465                         if (cur == NULL)
    1466                                 break;  /* list empty */
    1467                 }
    1468 
    1469                 /* get next state, as cur may get deleted */
    1470                 next = TAILQ_NEXT(cur, entry_list);
    1471 
    1472                 if ((cur->timeout == PFTM_UNLINKED) ||
    1473                     (pf_state_expires(cur) <= time_uptime))
    1474                         SLIST_INSERT_HEAD(&gcl, cur, gc_list);
    1475                 else
    1476                         pf_state_unref(cur);
    1477 
    1478                 cur = pf_state_ref(next);
    1479         }

I think the problem is the first 'while (maxcheck--)' loop may actually
wrap around the state_list and re-insert the `cur` to garbage collector list
again. Such sequence of events would match the panic I could see. I think
the right fix is to break from the while loop as soon, as we reach
the end of the state_list.

I've also noticed yet another minor problem there in loop, which
processes the garbage collector list:

    1487         while ((next = SLIST_FIRST(&gcl)) != NULL) {
    1488                 SLIST_REMOVE_HEAD(&gcl, gc_list);
    1489                 if (next->timeout == PFTM_UNLINKED)
    1490                         pf_free_state(next);
    1491                 else if (pf_state_expires(next) <= time_uptime)) {
    1492                         pf_remove_state(next);
    1493                         pf_free_state(next);
    1494                 }
    1495 
    1496                 pf_state_unref(next);
    1497         }

at line 1491, we don't need to recalculate expiration time. The `next` item is
on the garbage collector list already, so it must be expired for sure.

patch below fixes both glitches. I wonder if it will work for you as well.

thanks for testing
and sorry for inconveniences

regards
sashan

--------8<---------------8<---------------8<------------------8<--------
diff --git a/sys/net/pf.c b/sys/net/pf.c
index bc33fa723ea..8cadd1f53c2 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -1476,6 +1476,9 @@ pf_purge_expired_states(u_int32_t maxcheck)
                        pf_state_unref(cur);
 
                cur = pf_state_ref(next);
+
+               if (cur == NULL)
+                       break;
        }
        PF_STATE_EXIT_READ();
 
@@ -1485,7 +1488,7 @@ pf_purge_expired_states(u_int32_t maxcheck)
                SLIST_REMOVE_HEAD(&gcl, gc_list);
                if (next->timeout == PFTM_UNLINKED)
                        pf_free_state(next);
-               else if (pf_state_expires(next) <= time_uptime) {
+               else {
                        pf_remove_state(next);
                        pf_free_state(next);
                }

Re: pf.conf "set timeout interval 1" causes kernel crash

Reply via email to