[
https://issues.apache.org/jira/browse/IMPALA-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754742#comment-17754742
]
Michael Smith edited comment on IMPALA-11542 at 8/15/23 7:02 PM:
-----------------------------------------------------------------
This also shows up consistently during data load with
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch-ARM.
I've identified a similar case in Julia -
https://github.com/JuliaLang/julia/issues/42295 - which they raised with in an
LLVM thread -
https://discourse.llvm.org/t/problems-with-code-model-large-and-relocations/70511
- without response.
A few things I've observed digging into this:
* Graviton v2 instances are coded as neoverse-n1, and Graviton v3 instances are
neoverse-v1.
* LLVM added official support for neoverse-v1 instances in
https://github.com/llvm/llvm-project/commit/c2c2cc13601374f987cb03dfc8ef841c64b14024
(LLVM 12+). However neoverse-v1 is not automatically detected until
https://github.com/llvm/llvm-project/commit/b92102a6d79f401c1cc8bf7cd0d56e4a6cf90115
(LLVM 14+). Testing with LLVM 12 still results in the same failure on Graviton
v3 instances (m7g4xlarge), although I haven't tried overriding the
TargetMachine yet.
* Data load doesn't fail running Ubuntu with VirtualizationFramework on my
Apple M1 MacBook Pro, but this test case does. Setting CodeModel to Small
doesn't help.
was (Author: JIRAUSER288956):
This also shows up consistently during data load with
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch-ARM.
I've identified a similar case in Julia -
https://github.com/JuliaLang/julia/issues/42295 - which they raised with in an
LLVM thread -
https://discourse.llvm.org/t/problems-with-code-model-large-and-relocations/70511
- without response.
A few things I've observed digging into this:
* Graviton v2 instances are coded as neoverse-n1, and Graviton v3 instances are
neoverse-v1.
* LLVM added official support for neoverse-v1 instances in
https://github.com/llvm/llvm-project/commit/c2c2cc13601374f987cb03dfc8ef841c64b14024
(LLVM 12+). However neoverse-v1 is not automatically detected until
https://github.com/llvm/llvm-project/commit/b92102a6d79f401c1cc8bf7cd0d56e4a6cf90115
(LLVM 14+). Testing with LLVM 12 still results in the same failure on Graviton
v3 instances (m7g4xlarge), although I haven't tried overriding the
TargetMachine yet.
* I don't think I see the same issue running Ubuntu with
VirtualizationFramework on my Apple M1 MacBook Pro. Data load succeeds.
> TestFailpoints::test_failpoints crash in ARM build
> --------------------------------------------------
>
> Key: IMPALA-11542
> URL: https://issues.apache.org/jira/browse/IMPALA-11542
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.1.0
> Reporter: Quanlong Huang
> Assignee: Michael Smith
> Priority: Critical
> Labels: arm
>
> Saw the crash in
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch-ARM/13]
> In the ERROR log:
> {noformat}
> Picked up JAVA_TOOL_OPTIONS:
> -agentlib:jdwp=transport=dt_socket,address=30000,server=y,suspend=n
> impalad:
> /home/ubuntu/native-toolchain/source/llvm/llvm-5.0.1-asserts.src-p3/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:400:
> void llvm::RuntimeDyldELF::resolveAArch64Relocation(const
> llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion
> `isInt<33>(Result) && "overflow check failed for relocation"' failed.
> Minidump in thread [20013]exec-finstance
> (finst:1e4a0f56622f2a15:51c2970900000005) running query
> 1e4a0f56622f2a15:51c2970900000000, fragment instance
> 1e4a0f56622f2a15:51c2970900000005
> Wrote minidump to
> /home/ubuntu/Impala/logs/ee_tests/minidumps/impalad/de4830d8-009d-47f4-f14bb68a-f0d8cd4c.dmp
> {noformat}
> In the INFO log:
> {noformat}
> I0830 06:54:49.173234 11329 impala-beeswax-server.cc:516] query: Query {
> 01: query (string) = "SELECT STRAIGHT_JOIN *\n FROM alltypes t1\n
> JOIN /*+broadcast*/ alltypesagg t2 ON t1.id = t2.id\n
> WHERE t2.int_col < 1000",
> 03: configuration (list) = list<string>[10] {
> [0] = "CLIENT_IDENTIFIE[...](273)",
> [1] = "TEST_REPLAN=1",
> [2] = "DISABLE_CODEGEN=False",
> [3] = "BATCH_SIZE=0",
> [4] = "NUM_NODES=0",
> [5] = "DISABLE_CODEGEN_ROWS_THRESHOLD=0",
> [6] = "MT_DOP=4",
> [7] = "ABORT_ON_ERROR=1",
> [8] =
> "DEBUG_ACTION=4:GETNEXT:MEM_LIMIT_EXCEEDED|COORD_BEFORE_EXEC_RPC:JITTER@[email protected]",
> [9] = "EXEC_SINGLE_NODE_ROWS_THRESHOLD=0",
> },
> 04: hadoop_user (string) = "ubuntu",
> }
> ...
> 74: client_identifier (string) =
> "failure/test_failpoints.py::TestFailpoints::()::test_failpoints[protocol:beeswax|table_format:seq/snap/block|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_sing",
> ...
> I0830 06:54:49.173739 11329 Frontend.java:1877]
> 1e4a0f56622f2a15:51c2970900000000] Analyzing query: SELECT STRAIGHT_JOIN *
> FROM alltypes t1
> JOIN /*+broadcast*/ alltypesagg t2 ON t1.id = t2.id
> WHERE t2.int_col < 1000 db: functional_seq_snap {noformat}
> The client_identifier shows it's TestFailpoints::test_failpoints.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]