Hello,
On Wed, Jul 05, 2023 at 03:33:36PM +0300, Kapetanakis Giannis wrote:
> In any case we shouldn't get table statistics for tables that are down.
>
> log_debug are just added by for my debugging, but the
> if (!rdr->table->up) continue;
> should probably go in.
>
> G
>
> Index: pfe.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
> retrieving revision 1.90
> diff -u -p -r1.90 pfe.c
> --- pfe.c 14 Sep 2020 11:30:25 -0000 1.90
> +++ pfe.c 5 Jul 2023 12:27:37 -0000
> @@ -790,6 +790,12 @@ pfe_statistics(int fd, short events, voi
> getmonotime(&tv_now);
>
> TAILQ_FOREACH(rdr, env->sc_rdrs, entry) {
> + if (!rdr->table->up) {
> + log_debug("%s: table: %s is down. continuing",
> __func__, rdr->conf.name);
> + continue;
> + }
> + //bilias
> + log_debug("%s: table: %s, up: %d id: %d", __func__,
> rdr->conf.name, rdr->table->up, rdr->conf.table_id);
> cnt = check_table(env, rdr, rdr->table);
> if (rdr->conf.backup_id != EMPTY_TABLE)
> cnt += check_table(env, rdr, rdr->backup);
>
>
this change might be valid fix. however I'm afraid it will uncover new issues.
in relayd. It just seems like a tip of iceberg.
I think I've found explanation how table can disappear when host gets
disabled. I hope someone who is more familiar with relayd can fill
some details I'm still missing here.
snippet here comes from disable_host() function in pfe.c:
602 if (host->up == HOST_UP) {
603 if ((table = table_find(env, host->conf.tableid)) == NULL)
604 fatalx("%s: invalid table id", __func__);
605 table->up--;
606 table->conf.flags |= F_CHANGED;
607 }
608
609 host->up = HOST_UNKNOWN;
...
624 if (!host->conf.parentid) {
625 /* Disable all children */
626 SLIST_FOREACH(h, &host->children, child)
627 disable_host(c, id, h);
628 pfe_sync();
629 }
at line 605 table->up counter may get to 0 if host is the only/last
host in table.
if there is no parent, we are going disable children too calling
to pfe_sync() at line 628.
pfe_sync() function updates rdr rules in pf. The logic there is still
kind of blurry to me.
687 TAILQ_FOREACH(rdr, env->sc_rdrs, entry) {
688 rdr->conf.flags &= ~(F_BACKUP);
689 rdr->conf.flags &= ~(F_DOWN);
690
691 if (rdr->conf.flags & F_DISABLE ||
692 (rdr->table->up == 0 && rdr->backup->up == 0)) {
693 rdr->conf.flags |= F_DOWN;
694 active = NULL;
...
714 if (rdr->conf.flags & F_DOWN) {
715 if (rdr->conf.flags & F_ACTIVE_RULESET) {
716 flush_table(env, rdr);
717 log_debug("%s: disabling ruleset",
__func__);
718 rdr->conf.flags &= ~(F_ACTIVE_RULESET);
719 id.id = rdr->conf.id;
720 imsg.hdr.type = IMSG_CTL_PULL_RULESET;
721 imsg.hdr.len = sizeof(id) +
IMSG_HEADER_SIZE;
722 imsg.data = &id;
723 sync_ruleset(env, rdr, 0);
724 control_imsg_forward(env->sc_ps, &imsg);
725 }
here we walk through the list of redirect rules. at line 693 we mark
redirect to be down if there is no table and no back up.
if redirect is down and has active rules we are going to flush
those rules at line 723 by calling sync_ruleset(env, rdr, 0);
the flush operation done by sync_ruleset() reads as follows:
332 void
333 sync_ruleset(struct relayd *env, struct rdr *rdr, int enable)
...
352 if (transaction_init(env, anchor) == -1) {
353 log_warn("%s: transaction init failed", __func__);
354 return;
355 }
356
357 if (!enable) {
358 if (transaction_commit(env) == -1)
359 log_warn("%s: remove rules transaction failed",
360 __func__);
361 else
362 log_debug("%s: rules removed", __func__);
363 return;
364 }
365
as you can see the rules are removed by committing empty ryleset.
committing empty ruleset makes anchor empty, because commit operation
removes also tables which are not persistent. One can emulate that by using
pfctl on command line.
we start by loading simple rdr rule which reference table bar:
pf# cat bar.conf
pass in on vio0 from 192.168.3.0/24 to 10.164.0.3 rdr-to <bar>
pf# pfctl -a relayd/bar -f bar.conf
the next step is to check if table bar got created.
pf# pfctl -a relayd/bar -sT -vg
----r-- bar relayd/bar
the 'r' flag means table is referenced only. Such table is
not reported unless '-sT' option comes with '-vg'. Now let's
add address to bar table and see what happens:
pf# pfctl -a relayd/bar -t bar -T add 192.168.1.10
1/1 addresses added.
pf# pfctl -a relayd/bar -sT -vg
--a-r-- bar relayd/bar
there is 'a' flag which indicates table became active. The active
table is visible for '-sT' (show tables):
pf# pfctl -a relayd/bar -sT
bar
now let's see what happens if we clear rules from relayd/bar anchor
by committing empty ruleset:
pf# echo '' | pfctl -a relayd/bar -f -
pf# pfctl -a relayd/bar -sT -vg
pfctl: Anchor does not exist
both table and relayd/bar are gone. I thing this might be the mechanism
which creates surprise for relayd
on the other hand if I create the bar table first before the
rule which references the table is loaded, then table is created
with persistent flag. So the order in which load rules and
tables does matter:
pf# pfctl -a relayd/bar -sT -vg
pfctl: Anchor does not exist
pf# pfctl -a relayd/bar -t bar -T add 192.168.1.10
1 table created.
1/1 addresses added.
pf# pfctl -a relayd/bar -sT -vg
-pa---- bar relayd/bar
pf# pfctl -a relayd/bar -f bar.conf
pf# pfctl -a relayd/bar -sT -vg
-pa-r-- bar relayd/bar
pf# echo '' | pfctl -a relayd/bar -f -
pf# pfctl -a relayd/bar -sT -vg
-pa---- bar relayd/bar
looks like order matters. If we load/create table before rules,
then table gets its 'persistent' flag. If table is loaded after
rules, then it loses its persistent flag and may get lost when
rules are flushed.
can you check table flags on your relay box?
pfctl -a relayd/... -sT -vg
just to see if my speculation is good enough or completely off.
thanks and
regards
sashan