Hi sashan,
On Tue, Aug 20, 2019 at 9:42 PM Alexandr Nedvedicky <
[email protected]> wrote:
> Hello,
>
> </snip>
>
> > >How-To-Repeat:
> > # echo "set timeout interval 1" >> /etc/pf.conf
> > # pfctl -f /etc/pf.conf
> >
> > <wait a few seconds>
> >
> > >Fix:
> > Not known.
>
> I've used same steps to reproduce the panic. This is what I could
> see in resulting crash:
>
> The system died at line 1441 in pf_free_state():
>
> 1415 void
> 1416 pf_free_state(struct pf_state *cur)
> 1417 {
> 1418 struct pf_rule_item *ri;
> 1419
> 1420 PF_ASSERT_LOCKED();
> ....
> 1440 pf_normalize_tcp_cleanup(cur);
> 1441 pfi_kif_unref(cur->kif, PFI_KIF_REF_STATE);
> 1442 TAILQ_REMOVE(&state_list, cur, entry_list);
> 1443 if (cur->tag)
> 1444 pf_tag_unref(cur->tag);
> 1445 pf_state_unref(cur);
>
> the instruction, where we died was as follows:
>
> Stopped at pf_free_state+0xfe: movq %rcx,0x28(%rax)
>
> there was -1 in rax. the -1 at offset 0x28 indicates the `cur` has been
> removed from state_list. To explain how is it possible, we need to step
> back to pf_free_state() caller, which is pf_purge_expired_states():
>
> 1453 static struct pf_state *cur = NULL;
> 1454 struct pf_state *next;
> 1455 SLIST_HEAD(pf_state_gcl, pf_state) gcl;
> 1456
> 1457 PF_ASSERT_UNLOCKED();
> 1458 SLIST_INIT(&gcl);
> 1459
> 1460 PF_STATE_ENTER_READ();
> 1461 while (maxcheck--) {
> 1462 /* wrap to start of list when we hit the end */
> 1463 if (cur == NULL) {
> 1464 cur =
> pf_state_ref(TAILQ_FIRST(&state_list));
> 1465 if (cur == NULL)
> 1466 break; /* list empty */
> 1467 }
> 1468
> 1469 /* get next state, as cur may get deleted */
> 1470 next = TAILQ_NEXT(cur, entry_list);
> 1471
> 1472 if ((cur->timeout == PFTM_UNLINKED) ||
> 1473 (pf_state_expires(cur) <= time_uptime))
> 1474 SLIST_INSERT_HEAD(&gcl, cur, gc_list);
> 1475 else
> 1476 pf_state_unref(cur);
> 1477
> 1478 cur = pf_state_ref(next);
> 1479 }
>
> I think the problem is the first 'while (maxcheck--)' loop may actually
> wrap around the state_list and re-insert the `cur` to garbage collector
> list
> again. Such sequence of events would match the panic I could see. I think
> the right fix is to break from the while loop as soon, as we reach
> the end of the state_list.
>
> I've also noticed yet another minor problem there in loop, which
> processes the garbage collector list:
>
> 1487 while ((next = SLIST_FIRST(&gcl)) != NULL) {
> 1488 SLIST_REMOVE_HEAD(&gcl, gc_list);
> 1489 if (next->timeout == PFTM_UNLINKED)
> 1490 pf_free_state(next);
> 1491 else if (pf_state_expires(next) <= time_uptime)) {
> 1492 pf_remove_state(next);
> 1493 pf_free_state(next);
> 1494 }
> 1495
> 1496 pf_state_unref(next);
> 1497 }
>
> at line 1491, we don't need to recalculate expiration time. The `next`
> item is
> on the garbage collector list already, so it must be expired for sure.
>
> patch below fixes both glitches. I wonder if it will work for you as well.
>
Thank you for your quick response. I've applied the diff and it worked, no
more crashes!
Thanks again,
--Kor
>
> thanks for testing
> and sorry for inconveniences
>
> regards
> sashan
>
> --------8<---------------8<---------------8<------------------8<--------
> diff --git a/sys/net/pf.c b/sys/net/pf.c
> index bc33fa723ea..8cadd1f53c2 100644
> --- a/sys/net/pf.c
> +++ b/sys/net/pf.c
> @@ -1476,6 +1476,9 @@ pf_purge_expired_states(u_int32_t maxcheck)
> pf_state_unref(cur);
>
> cur = pf_state_ref(next);
> +
> + if (cur == NULL)
> + break;
> }
> PF_STATE_EXIT_READ();
>
> @@ -1485,7 +1488,7 @@ pf_purge_expired_states(u_int32_t maxcheck)
> SLIST_REMOVE_HEAD(&gcl, gc_list);
> if (next->timeout == PFTM_UNLINKED)
> pf_free_state(next);
> - else if (pf_state_expires(next) <= time_uptime) {
> + else {
> pf_remove_state(next);
> pf_free_state(next);
> }
>
>
>