[
https://issues.apache.org/jira/browse/KUDU-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Serbin resolved KUDU-3517.
---------------------------------
Fix Version/s: 1.18.0
Resolution: Fixed
> Kudu servers crash on Graviton3 (aarch64) instances in EC2
> ----------------------------------------------------------
>
> Key: KUDU-3517
> URL: https://issues.apache.org/jira/browse/KUDU-3517
> Project: Kudu
> Issue Type: Bug
> Components: CLI, client, master, tserver
> Affects Versions: 1.17.0
> Environment: Graviton3 instances in EC2
> Reporter: Alexey Serbin
> Assignee: Alexey Serbin
> Priority: Critical
> Labels: ARM, aarch64
> Fix For: 1.18.0
>
>
> Kudu masters and tablet servers built from the source code released with Kudu
> 1.17.0 crash with SIGSEGV when running on Graviton3 (aarch64) instances in
> EC2.
> Upon closer examination, it turned out the problem happens when
> StackCollector tries to symbolize a thread's stack, and an example of the
> trace looked like below. The stack trace has been collected under GDB when
> running a smoke test with the kudu CLI tool: {{kudu perf loadgen
> <master_rpc_addr> \-\-table_num_replicas=3 \-\-num_rows_per_thread=1000000}}:
> {noformat}
> #0 access_mem (as=0x3304418 <local_addr_space>, addr=7745970402396146688,
> val=0xfffff325ca18, write=0, arg=0xfffff325ce70)
> at
> /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Ginit.c:337
> #1 0x0000000000a97ac0 in is_plt_entry (c=0xfffff325ce70)
> at
> /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:43
> #2 0x0000000000a97fdc in _ULaarch64_step (cursor=0xfffff325ce70)
> at
> /root/Projects/kudu/thirdparty/src/libunwind-1.6.2/src/aarch64/Gstep.c:171
> #3 0x00000000025050c8 in kudu::StackTrace::Collect (
> this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0)
> at /root/Projects/kudu/src/kudu/util/debug-util.cc:612
> #4 0x0000000002507f64 in kudu::StackTrace::Collect (
> this=this@entry=0xfffff325d7d8, skip_frames=skip_frames@entry=0)
> at /root/Projects/kudu/src/kudu/util/debug-util.cc:579
> #5 0x000000000259c390 in kudu::(anonymous
> namespace)::SubmitSpinLockProfileData (contendedlock=0x4ed8a220,
> wait_cycles=2966400)
> at /root/Projects/kudu/src/kudu/util/spinlock_profiling.cc:229
> {noformat}
> The crash happens with SIGSEGV somewhere in the libunwind code, and that
> looks very similar to what's reported in [this github
> issue|https://github.com/libunwind/libunwind/issues/260].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)