Alexey Serbin created KUDU-3517:
-----------------------------------
Summary: Kudu servers crash on Graviton3 (aarch64) instances in EC2
Key: KUDU-3517
URL: https://issues.apache.org/jira/browse/KUDU-3517
Project: Kudu
Issue Type: Bug
Components: CLI, client, master, tserver
Affects Versions: 1.17.0
Environment: Graviton3 instances in EC2
Reporter: Alexey Serbin
Kudu masters and tablet servers built from the source code released with Kudu
1.17.0 crash with SIGSEGV when running on Graviton3 (aarch64) instances in EC2.
Upon closer examination, it turned out the problem happens when StackCollector
tries to symbolize a thread's stack, and an example of the trace looked like
below. The stack trace has been collected under GDB when running a smoke test
with the kudu CLI tool: {{kudu perf loadgen <master_rpc_addr>
\-\-table_num_replicas=3 \-\-num_rows_per_thread=1000000}}:
{noformat}
#0 access_mem (as=0x3304418 <local_addr_space>, addr=7745970402396146688,
val=0xfffff325ca18, write=0, arg=0xfffff325ce70)
at
/root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Ginit.c:337
#1 0x0000000000a97ac0 in is_plt_entry (c=0xfffff325ce70)
at /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:43
#2 0x0000000000a97fdc in _ULaarch64_step (cursor=0xfffff325ce70)
at
/root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:171
#3 0x00000000025050c8 in kudu::StackTrace::Collect (
this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0)
at /root/Projects/kudu/src/kudu/util/debug-util.cc:612
#4 0x0000000002507f64 in kudu::StackTrace::Collect (
this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0)
at /root/Projects/kudu/src/kudu/util/debug-util.cc:579
#5 0x000000000259c390 in kudu::(anonymous
namespace)::SubmitSpinLockProfileData (contendedlock=0x4ed8a220,
wait_cycles=2966400)
at /root/Projects/kudu/src/kudu/util/spinlock_profiling.cc:229
{noformat}
The crash happens with SIGSEGV somewhere in the libunwind code, and that looks
very similar to what's reported in [this github
issue|https://github.com/libunwind/libunwind/issues/260].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)