On 28/06/2021 09:46, Matthew Reeve wrote:

On 24/06/2021 13:08, Ondrej Zajicek wrote:
On Fri, Jun 18, 2021 at 05:06:27PM +0100, Matthew Reeve wrote:
Hi, yes sure, here it is. Please let me know if this does not give you what
you need.

Thanks!

Thanks, that looks like an issue with slists. We had similar issue with
lists code in the past and reworked them to be more conservative. Will
check that.
Great, thanks. If you want to make any changes on a branch or something, I can build it and test it on my hardware if it would help.

root@OpenWrt:/tmp# gdb debug/bird bird.1623776146.6869.7.core
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-openwrt-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
     <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from debug/bird...
[New LWP 6869]
Core was generated by `./bird'.
Program terminated with signal SIGBUS, Bus error.
#0  ospf_rt_reset (p=0x1d610a0) at proto/ospf/rt.c:1646
1646    proto/ospf/rt.c: No such file or directory.
(gdb) bt
#0  ospf_rt_reset (p=0x1d610a0) at proto/ospf/rt.c:1646
#1  ospf_rt_spf (p=0x1d610a0) at proto/ospf/rt.c:1698
#2  ospf_rt_spf (p=0x1d610a0) at proto/ospf/rt.c:1688
#3  ospf_disp (timer=<optimized out>) at proto/ospf/ospf.c:468
#4  0x00061574 in timers_fire (loop=0xc4878 <main_timeloop>) at
lib/timer.c:235
#5  0x00012ca8 in io_loop () at sysdep/unix/io.c:2195
#6  main (argc=<optimized out>, argv=<optimized out>) at
sysdep/unix/main.c:939
(gdb)

On 18/06/2021 16:16, Ondrej Zajicek wrote:
On Mon, Jun 14, 2021 at 04:25:04PM +0100, Matthew Reeve wrote:
Hi,

when using bird 2.0.8 on openwrt 21.02 (and other versions) on a Netgear
R7800 router, if the OSPF protocol is used, either v2 or v3, bird
immediately crashes on startup with:

Fri Jun 11 14:41:11 2021 daemon.info bird: Started
Fri Jun 11 14:41:11 2021 kern.err kernel: [ 3500.853248] Alignment trap: not handling instruction f44c0a1f at [<00035848>] Fri Jun 11 14:41:11 2021
kern.alert kernel: [ 3500.853283] 8<--- cut here ---
Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.859363] Unhandled fault:
alignment exception (0x801) at 0x007e0624
Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.862443] pgd = 0bbef4fd
Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.868821] [007e0624]
*pgd=5d6ca835, *pte=5c40b75f, *ppte=5c40bc7f


This router uses an ARMv7 processor and the issue seems to be to do with memory alignment issues. I've debugged it and traced it to an access to the top_hash_entry struct. I've found that if I add the PACKED macro to the
struct definition then it fixes the problem, as per this patch:
Hi

Thanks, could you try to get backtrace from the coredump using gdb to see
where is the invalid access?


Hi Ondrej,

just wondering if you'd had a chance to look at this any further yet please?

Many thanks,

Matt.

Reply via email to