On 06/07/2023 12:59, Alexandr Nedvedicky wrote:
> Hello,
>
>
> On Wed, Jul 05, 2023 at 03:33:36PM +0300, Kapetanakis Giannis wrote:
>> In any case we shouldn't get table statistics for tables that are down.
>>
>> log_debug are just added by for my debugging, but the
>> if (!rdr->table->up) continue;
>> should probably go in.
>>
>> G
>>
>> Index: pfe.c
>> ===================================================================
>> RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
>> retrieving revision 1.90
>> diff -u -p -r1.90 pfe.c
>> --- pfe.c 14 Sep 2020 11:30:25 -0000 1.90
>> +++ pfe.c 5 Jul 2023 12:27:37 -0000
>> @@ -790,6 +790,12 @@ pfe_statistics(int fd, short events, voi
>> getmonotime(&tv_now);
>>
>> TAILQ_FOREACH(rdr, env->sc_rdrs, entry) {
>> + if (!rdr->table->up) {
>> + log_debug("%s: table: %s is down. continuing",
>> __func__, rdr->conf.name);
>> + continue;
>> + }
>> + //bilias
>> + log_debug("%s: table: %s, up: %d id: %d", __func__,
>> rdr->conf.name, rdr->table->up, rdr->conf.table_id);
>> cnt = check_table(env, rdr, rdr->table);
>> if (rdr->conf.backup_id != EMPTY_TABLE)
>> cnt += check_table(env, rdr, rdr->backup);
>>
>>
> this change might be valid fix. however I'm afraid it will uncover new issues.
> in relayd. It just seems like a tip of iceberg.
>
> I think I've found explanation how table can disappear when host gets
> disabled. I hope someone who is more familiar with relayd can fill
> some details I'm still missing here.
>
> snippet here comes from disable_host() function in pfe.c:
>
> 602 if (host->up == HOST_UP) {
> 603 if ((table = table_find(env, host->conf.tableid)) == NULL)
> 604 fatalx("%s: invalid table id", __func__);
> 605 table->up--;
> 606 table->conf.flags |= F_CHANGED;
> 607 }
> 608
> 609 host->up = HOST_UNKNOWN;
> ...
> 624 if (!host->conf.parentid) {
> 625 /* Disable all children */
> 626 SLIST_FOREACH(h, &host->children, child)
> 627 disable_host(c, id, h);
> 628 pfe_sync();
> 629 }
>
> at line 605 table->up counter may get to 0 if host is the only/last
> host in table.
>
> if there is no parent, we are going disable children too calling
> to pfe_sync() at line 628.
>
> pfe_sync() function updates rdr rules in pf. The logic there is still
> kind of blurry to me.
>
> 687 TAILQ_FOREACH(rdr, env->sc_rdrs, entry) {
> 688 rdr->conf.flags &= ~(F_BACKUP);
> 689 rdr->conf.flags &= ~(F_DOWN);
> 690
> 691 if (rdr->conf.flags & F_DISABLE ||
> 692 (rdr->table->up == 0 && rdr->backup->up == 0)) {
> 693 rdr->conf.flags |= F_DOWN;
> 694 active = NULL;
> ...
> 714 if (rdr->conf.flags & F_DOWN) {
> 715 if (rdr->conf.flags & F_ACTIVE_RULESET) {
> 716 flush_table(env, rdr);
> 717 log_debug("%s: disabling ruleset",
> __func__);
> 718 rdr->conf.flags &= ~(F_ACTIVE_RULESET);
> 719 id.id = rdr->conf.id;
> 720 imsg.hdr.type = IMSG_CTL_PULL_RULESET;
> 721 imsg.hdr.len = sizeof(id) +
> IMSG_HEADER_SIZE;
> 722 imsg.data = &id;
> 723 sync_ruleset(env, rdr, 0);
> 724 control_imsg_forward(env->sc_ps, &imsg);
> 725 }
>
> here we walk through the list of redirect rules. at line 693 we mark
> redirect to be down if there is no table and no back up.
>
> if redirect is down and has active rules we are going to flush
> those rules at line 723 by calling sync_ruleset(env, rdr, 0);
> the flush operation done by sync_ruleset() reads as follows:
>
> 332 void
> 333 sync_ruleset(struct relayd *env, struct rdr *rdr, int enable)
> ...
> 352 if (transaction_init(env, anchor) == -1) {
> 353 log_warn("%s: transaction init failed", __func__);
> 354 return;
> 355 }
> 356
> 357 if (!enable) {
> 358 if (transaction_commit(env) == -1)
> 359 log_warn("%s: remove rules transaction failed",
> 360 __func__);
> 361 else
> 362 log_debug("%s: rules removed", __func__);
> 363 return;
> 364 }
> 365
>
> as you can see the rules are removed by committing empty ryleset.
> committing empty ruleset makes anchor empty, because commit operation
> removes also tables which are not persistent. One can emulate that by using
> pfctl on command line.
> we start by loading simple rdr rule which reference table bar:
>
> pf# cat bar.conf
> pass in on vio0 from 192.168.3.0/24 to 10.164.0.3 rdr-to <bar>
> pf# pfctl -a relayd/bar -f bar.conf
>
> the next step is to check if table bar got created.
>
> pf# pfctl -a relayd/bar -sT -vg
> ----r-- bar relayd/bar
>
> the 'r' flag means table is referenced only. Such table is
> not reported unless '-sT' option comes with '-vg'. Now let's
> add address to bar table and see what happens:
>
> pf# pfctl -a relayd/bar -t bar -T add 192.168.1.10
> 1/1 addresses added.
> pf# pfctl -a relayd/bar -sT -vg
> --a-r-- bar relayd/bar
>
> there is 'a' flag which indicates table became active. The active
> table is visible for '-sT' (show tables):
>
> pf# pfctl -a relayd/bar -sT
> bar
>
> now let's see what happens if we clear rules from relayd/bar anchor
> by committing empty ruleset:
>
> pf# echo '' | pfctl -a relayd/bar -f -
> pf# pfctl -a relayd/bar -sT -vg
> pfctl: Anchor does not exist
>
> both table and relayd/bar are gone. I thing this might be the mechanism
> which creates surprise for relayd
>
> on the other hand if I create the bar table first before the
> rule which references the table is loaded, then table is created
> with persistent flag. So the order in which load rules and
> tables does matter:
>
> pf# pfctl -a relayd/bar -sT -vg
> pfctl: Anchor does not exist
>
> pf# pfctl -a relayd/bar -t bar -T add 192.168.1.10
> 1 table created.
> 1/1 addresses added.
>
> pf# pfctl -a relayd/bar -sT -vg
> -pa---- bar relayd/bar
>
> pf# pfctl -a relayd/bar -f bar.conf
> pf# pfctl -a relayd/bar -sT -vg
> -pa-r-- bar relayd/bar
>
> pf# echo '' | pfctl -a relayd/bar -f -
> pf# pfctl -a relayd/bar -sT -vg
> -pa---- bar relayd/bar
>
> looks like order matters. If we load/create table before rules,
> then table gets its 'persistent' flag. If table is loaded after
> rules, then it loses its persistent flag and may get lost when
> rules are flushed.
>
> can you check table flags on your relay box?
> pfctl -a relayd/... -sT -vg
> just to see if my speculation is good enough or completely off.
>
> thanks and
> regards
> sashan
Hi sashan,
Your logic seems fine to me. Indeed it acts like they are persistent. They
don't have the p flag.
# pfctl -a 'relayd/dir-imap' -sT -vg
--a-r-- dir-imap relayd/dir-imap
# pfctl -a 'relayd/dir-sieve' -sT -vg
--a-r-- dir-sieve relayd/dir-sieve
# pfctl -a 'relayd/dir-lmtp' -sT -vg
--a-r-- dir-lmtp relayd/dir-lmtp
note that in pfe_filter.c in init_tables() is see
73: tables[i].pfrt_flags |= PFR_TFLAG_PERSIST;
Giannis
ps. I've send a diff about statistics in tech@
It does more than just checking table->up status, but your approach is much
better.