Probably the latest libutil cause of ibuf_data()
will test and report.
thanx,
G
On 30/06/2023 13:31, Kapetanakis Giannis wrote:
> The program does not terminate by system.
>
> It indeed exits from that fatal() function in check_table()
>
> I will add the table print and try to reproduce at some time. It might take a
> while cause I have to shutdown completely our mail for this to test.
>
> Will report back.
>
> Does it need to build with the latest libutil or I can compile with system's
> version?
>
> G
>
> On 30/06/2023 12:57, Alexandr Nedvedicky wrote:
>> Hello,
>>
>> I'm not familiar enough with relayd, so perhaps other folks
>> here might provide better way to troubleshoot the issue.
>>
>> On Fri, Jun 30, 2023 at 11:10:44AM +0300, Kapetanakis Giannis wrote:
>>> Hello,
>>>
>>> This happened to me twice.
>>> OpenBSD 7.3 with syspatches.
>>>
>>> I have a pair of carp/pfsync/pf/relayd firewall-load balancers with many
>>> redirects (only) on them.
>>>
>>> I wanted to do maintenance of some hosts bellow load balancers.
>>> After a while relayd crashed on Master firewall only.
>> when you say crash: does it mean the relayd was terminated
>> by system because of memory/stack/program violation?
>> if it is the case is there any chance to collect core file?
>>
>> or was it rather voluntary exit, when relayd called its function fatal()
>>
>> the 'No such file or director' error code, which comes from
>> DIOCRGETTSTATS
>> ioctl() come from line 1746 in sys/net/pf_table.c:
>>
>> 1731 int
>> 1732 pfr_get_tstats(struct pfr_table *filter, struct pfr_tstats *tbl, int
>> *size,
>> 1733 int flags)
>> 1734 {
>> 1735 struct pfr_ktable *p;
>> 1736 struct pfr_ktableworkq workq;
>> 1737 int n, nn;
>> 1738 time_t tzero = gettime();
>> 1739
>> 1740 /* XXX PFR_FLAG_CLSTATS disabled */
>> 1741 ACCEPT_FLAGS(flags, PFR_FLAG_ALLRSETS);
>> 1742 if (pfr_fix_anchor(filter->pfrt_anchor))
>> 1743 return (EINVAL);
>> 1744 n = nn = pfr_table_count(filter, flags);
>> 1745 if (n < 0)
>> 1746 return (ENOENT);
>>
>>
>> the pfr_table_count() function fails if and only if desired ruleset
>> does not exists.
>>
>> 2177 int
>> 2178 pfr_table_count(struct pfr_table *filter, int flags)
>> 2179 {
>> 2180 struct pf_ruleset *rs;
>> 2181
>> 2182 if (flags & PFR_FLAG_ALLRSETS)
>> 2183 return (pfr_ktable_cnt);
>> 2184 if (filter->pfrt_anchor[0]) {
>> 2185 rs = pf_find_ruleset(filter->pfrt_anchor);
>> 2186 return ((rs != NULL) ? rs->tables : -1);
>> 2187 }
>> 2188 return (pf_main_ruleset.tables);
>> 2189 }
>>
>> I wonder if it would help if adjust a fatal() line in relayd
>> to also capture table name and anchor it is trying to find.
>> diff which adjusts a call to fatal is below.
>>
>> if you don't want to build the whole tree and do in-place
>> build you will need to adjust CFLAGS and LDFLAGS. Something
>> like that will be needed:
>>
>> cd /path/to/your/src/usr.sbin/relayd
>> export CFLAGS='-I/path/to/your/src/sys -I/path/to/your/src/lib/libutil
>> export LDFLAGS='-L /path/to/your/src/lib/libutil'
>> make
>>
>>
>> </snip>
>>
>>> same logs on Backup firewall so far, but after a minute or so:
>>>
>>> Jun 30 01:47:46 ll1 relayd[61766]: pfe: check_table: cannot get table
>>> stats: No such file or directory
>> this is where I'd like to see what table relayd is trying
>> to look up. The process 61766 then exits using call `exit(1)`
>> on behalf of function fatal()
>>
>>> Jun 30 01:47:46 ll1 relayd[94434]: ca exiting, pid 94434
>>> Jun 30 01:47:46 ll1 relayd[83189]: ca exiting, pid 83189
>>> Jun 30 01:47:46 ll1 relayd[9023]: ca exiting, pid 9023
>>> Jun 30 01:47:46 ll1 relayd[89820]: ca exiting, pid 89820
>>> Jun 30 01:47:46 ll1 relayd[94676]: ca exiting, pid 94676
>>> Jun 30 01:47:46 ll1 relayd[1820]: hce exiting, pid 1820
>>> Jun 30 01:47:46 ll1 relayd[52103]: lost child: pid 61766 exited abnormally
>> parent relayd process noticed the child took exit(1)
>> because it could not find table.
>>
>> once you'll be able to run patched relayd can you try to reproduce
>> the issue?
>>
>> also it will help if you will collect additional data.
>>
>> pfctl -vsA > anchors-before
>> # reproduce the issue wait for relayd to exit/crrash
>> pfctl -vsA > anchors-after
>>
>> those data, together with output from adjusted call
>> to fatal() should help us to better understand
>> what's going on.
>>
>> thanks for your help
>> regards
>> sashan
>>
>> --------8<---------------8<---------------8<------------------8<--------
>> diff --git a/usr.sbin/relayd/pfe_filter.c b/usr.sbin/relayd/pfe_filter.c
>> index 347048ece56..e1ae050b768 100644
>> --- a/usr.sbin/relayd/pfe_filter.c
>> +++ b/usr.sbin/relayd/pfe_filter.c
>> @@ -632,7 +632,8 @@ check_table(struct relayd *env, struct rdr *rdr, struct
>> table *table)
>> goto toolong;
>>
>> if (ioctl(env->sc_pf->dev, DIOCRGETTSTATS, &io) == -1)
>> - fatal("%s: cannot get table stats", __func__);
>> + fatal("%s: cannot get table stats for %s@%s", __func__,
>> + io.pfrio_table.pfrt_name, io.pfrio_table.pfrt_anchor);
>>
>> return (tstats.pfrts_match);
>>