Hi,

One of the servers I'm running has recently rebooted due to a kernel
paging request that could not be handled correctly.

The server is a Lamobo-R1 with an Allwinner A20 running Linux 4.14.0 and
the RAM of the device is running at 432 MHz. The problem happened when
running CPU-intensive tasks. Although the device has a heatsink, it is
kept in a the Banana Pi R1 closed box (that has some holes for air flow
though), that might not be sufficient to dissipate the heat generated by
the SoC.

I am wondering if there could be a link between the heat generated by
the CPU, or maybe the current drained by the CPU and the fault (the
board also has a SATA drive connected, although its power was rerouted
directly to the USB connector instead of the AXP209).

Since the backtrace looks legit, it seems to me that the most likely
cause of this is DRAM corruption, that could have happened either
because of the heat or CPU+SATA current drain. I am thinking of lowering
the DRAM frequency to reduce the constraints on the voltage (DRAM
bandwidth is probably not a bottleneck for the server use-case).

What do you think?

The log of the paging request failure follows:

[222599.060580] Unable to handle kernel paging request at virtual address 
2d8b6010
[222599.067891] pgd = c0004000
[222599.070687] [012e0174] *pgd=00000000
[222599.074371] Internal error: Oops: 80000005 [#1] PREEMPT SMP ARM
[222599.080374] Modules linked in: xt_multiport 8021q iptable_mangle 
iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter 
bridge stp llc ip_tables x_tables
[222599.101695] CPU: 0 PID: 87 Comm: sugov:0 Not tainted 4.14.0+ #1
[222599.107697] Hardware name: Allwinner sun7i (A20) Family
[222599.113004] task: ef2ad9c0 task.stack: eea02000
[222599.117621] PC is at 0x12e0174
[222599.120774] LR is at arch_timer_read_counter_long+0x14/0x18
[222599.126428] pc : [<012e0174>]    lr : [<c010efb4>]    psr: a00f01b3
[222599.132775] sp : eea03e50  ip : 00000000  fp : 365c0400
[222599.138082] r10: 2aea5400  r9 : 00000013  r8 : c0945314
[222599.143390] r7 : a00f0113  r6 : e1e85e2b  r5 : 000350a2  r4 : c0df11a0
[222599.149998] r3 : 012e0175  r2 : 00000010  r1 : 000004db  r0 : c0da0878
[222599.156608] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA Thumb  Segment 
none
[222599.164083] Control: 10c5387d  Table: 5e80006a  DAC: 00000051
[222599.169911] Process sugov:0 (pid: 87, stack limit = 0xeea02218)
[222599.175911] Stack: (0xeea03e50 to 0xeea04000)
[222599.180354] 3e40:                                     c0df11a0 c08dad38 
a1005310 ef009f40
[222599.188613] 3e60: 00000365 c0445950 365c0400 016e3600 00011300 00000000 
00000000 ef008100
[222599.196871] 3e80: ef008300 016e3600 2aea5400 c043fbf0 ee9e5364 c043cc9c 
ef00c300 365c0400
[222599.205128] 3ea0: 00000000 ef008300 ee9e53c0 ee9e5364 2aea5400 c043cf64 
ee9e5580 365c0400
[222599.213386] 3ec0: ee9e5340 ef7bd050 ee9e53c0 c043cfcc ee9eef00 00000000 
ee9e5340 c0541e88
[222599.221643] 3ee0: eea03f20 00000000 ee9e5500 ee9e53e4 365c0400 2aea5400 
c0da032c ee9e4d00
[222599.229901] 3f00: 00000000 c0de4c14 00000005 000dea80 00000000 00000000 
eea03f60 c06d5b3c
[222599.238159] 3f20: 00000010 000afc80 000dea80 00000021 ee9eff3c ee9eff50 
ee9eff64 ee9eff68
[222599.246417] 3f40: 00000000 eea02000 ffffe000 c0167a1c ee9eff3c c0dc7648 
ee9eff64 c0142210
[222599.254675] 3f60: ef2ad9c0 ee9e5140 00000000 ee9e5180 eea02000 ee9eff64 
c014211c ee9e515c
[222599.262933] 3f80: ef051bb0 c01420b8 00000000 ee9e5180 c0141f94 00000000 
00000000 00000000
[222599.271189] 3fa0: 00000000 00000000 00000000 c0107990 00000000 00000000 
00000000 00000000
[222599.279447] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000
[222599.287705] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 
ffffffff ffffffff
[222599.295990] [<c010efb4>] (arch_timer_read_counter_long) from [<c08dad38>] 
(__timer_delay+0x48/0x58)
[222599.305133] [<c08dad38>] (__timer_delay) from [<c0445950>] 
(clk_factors_set_rate+0xf8/0x118)
[222599.313665] [<c0445950>] (clk_factors_set_rate) from [<c043cc9c>] 
(clk_change_rate+0x19c/0x250)
[222599.322451] [<c043cc9c>] (clk_change_rate) from [<c043cf64>] 
(clk_core_set_rate_nolock+0x68/0xb0)
[222599.331444] [<c043cf64>] (clk_core_set_rate_nolock) from [<c043cfcc>] 
(clk_set_rate+0x20/0x30)
[222599.340158] [<c043cfcc>] (clk_set_rate) from [<c0541e88>] 
(dev_pm_opp_set_rate+0x250/0x35c)
[222599.348608] [<c0541e88>] (dev_pm_opp_set_rate) from [<c06d5b3c>] 
(__cpufreq_driver_target+0x228/0x4c4)
[222599.358007] [<c06d5b3c>] (__cpufreq_driver_target) from [<c0167a1c>] 
(sugov_work+0x24/0x38)
[222599.366451] [<c0167a1c>] (sugov_work) from [<c0142210>] 
(kthread_worker_fn+0xf4/0x1b8)
[222599.374457] [<c0142210>] (kthread_worker_fn) from [<c01420b8>] 
(kthread+0x124/0x154)
[222599.382291] [<c01420b8>] (kthread) from [<c0107990>] 
(ret_from_fork+0x14/0x24)
[222599.389606] Code: bad PC value
[222599.392766] ---[ end trace a3e51cc3b8944ff6 ]---
[222599.397471] Kernel panic - not syncing: Fatal exception
[222599.402804] CPU1: stopping
[222599.405627] CPU: 1 PID: 6284 Comm: python Tainted: G      D         4.14.0+ 
#1
[222599.412925] Hardware name: Allwinner sun7i (A20) Family
[222599.418259] [<c010fb4c>] (unwind_backtrace) from [<c010b2bc>] 
(show_stack+0x10/0x14)
[222599.426088] [<c010b2bc>] (show_stack) from [<c08dd428>] 
(dump_stack+0x84/0x98)
[222599.433395] [<c08dd428>] (dump_stack) from [<c010e480>] 
(handle_IPI+0x168/0x17c)
[222599.440873] [<c010e480>] (handle_IPI) from [<c01014d8>] 
(gic_handle_irq+0x8c/0x90)
[222599.448524] [<c01014d8>] (gic_handle_irq) from [<c010c230>] 
(__irq_usr+0x50/0x80)
[222599.456082] Exception stack(0xdeac9fb0 to 0xdeac9ff8)
[222599.461216] 9fa0:                                     b6cd9480 0069ca78 
0069ca78 00000000
[222599.469473] 9fc0: b6cdf580 b6cd9480 006fc050 b6ce0ab0 0068d000 b6ce97f0 
006a0174 b6cdf590
[222599.477727] 9fe0: 006f6f90 be8b5a54 004d6351 004aaf20 400f0030 ffffffff

-- 
Paul Kocialkowski,

developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to