[jira] [Comment Edited] (IMPALA-11542) TestFailpoints::test_failpoints crash in ARM build

Michael Smith (Jira) Tue, 15 Aug 2023 12:03:05 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754742#comment-17754742
 ]


Michael Smith edited comment on IMPALA-11542 at 8/15/23 7:02 PM:
-----------------------------------------------------------------

This also shows up consistently during data load with 
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch-ARM.

I've identified a similar case in Julia - 
https://github.com/JuliaLang/julia/issues/42295 - which they raised with in an 
LLVM thread - 
https://discourse.llvm.org/t/problems-with-code-model-large-and-relocations/70511
 - without response.

A few things I've observed digging into this:
* Graviton v2 instances are coded as neoverse-n1, and Graviton v3 instances are 
neoverse-v1.
* LLVM added official support for neoverse-v1 instances in 
https://github.com/llvm/llvm-project/commit/c2c2cc13601374f987cb03dfc8ef841c64b14024
 (LLVM 12+). However neoverse-v1 is not automatically detected until 
https://github.com/llvm/llvm-project/commit/b92102a6d79f401c1cc8bf7cd0d56e4a6cf90115
 (LLVM 14+). Testing with LLVM 12 still results in the same failure on Graviton 
v3 instances (m7g4xlarge), although I haven't tried overriding the 
TargetMachine yet.
* Data load doesn't fail running Ubuntu with VirtualizationFramework on my 
Apple M1 MacBook Pro, but this test case does. Setting CodeModel to Small 
doesn't help.


was (Author: JIRAUSER288956):
This also shows up consistently during data load with 
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch-ARM.

I've identified a similar case in Julia - 
https://github.com/JuliaLang/julia/issues/42295 - which they raised with in an 
LLVM thread - 
https://discourse.llvm.org/t/problems-with-code-model-large-and-relocations/70511
 - without response.

A few things I've observed digging into this:
* Graviton v2 instances are coded as neoverse-n1, and Graviton v3 instances are 
neoverse-v1.
* LLVM added official support for neoverse-v1 instances in 
https://github.com/llvm/llvm-project/commit/c2c2cc13601374f987cb03dfc8ef841c64b14024
 (LLVM 12+). However neoverse-v1 is not automatically detected until 
https://github.com/llvm/llvm-project/commit/b92102a6d79f401c1cc8bf7cd0d56e4a6cf90115
 (LLVM 14+). Testing with LLVM 12 still results in the same failure on Graviton 
v3 instances (m7g4xlarge), although I haven't tried overriding the 
TargetMachine yet.
* I don't think I see the same issue running Ubuntu with 
VirtualizationFramework on my Apple M1 MacBook Pro. Data load succeeds.

> TestFailpoints::test_failpoints crash in ARM build
> --------------------------------------------------
>
>                 Key: IMPALA-11542
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11542
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.1.0
>            Reporter: Quanlong Huang
>            Assignee: Michael Smith
>            Priority: Critical
>              Labels: arm
>
> Saw the crash in 
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch-ARM/13]
> In the ERROR log:
> {noformat}
> Picked up JAVA_TOOL_OPTIONS: 
> -agentlib:jdwp=transport=dt_socket,address=30000,server=y,suspend=n  
> impalad: 
> /home/ubuntu/native-toolchain/source/llvm/llvm-5.0.1-asserts.src-p3/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:400:
>  void llvm::RuntimeDyldELF::resolveAArch64Relocation(const 
> llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion 
> `isInt<33>(Result) && "overflow check failed for relocation"' failed.
> Minidump in thread [20013]exec-finstance 
> (finst:1e4a0f56622f2a15:51c2970900000005) running query 
> 1e4a0f56622f2a15:51c2970900000000, fragment instance 
> 1e4a0f56622f2a15:51c2970900000005
> Wrote minidump to 
> /home/ubuntu/Impala/logs/ee_tests/minidumps/impalad/de4830d8-009d-47f4-f14bb68a-f0d8cd4c.dmp
>  {noformat}
> In the INFO log:
> {noformat}
> I0830 06:54:49.173234 11329 impala-beeswax-server.cc:516] query: Query {
>   01: query (string) = "SELECT STRAIGHT_JOIN *\n           FROM alltypes t1\n 
>                  JOIN /*+broadcast*/ alltypesagg t2 ON t1.id = t2.id\n        
>    WHERE t2.int_col < 1000",
>   03: configuration (list) = list<string>[10] {
>     [0] = "CLIENT_IDENTIFIE[...](273)",
>     [1] = "TEST_REPLAN=1",
>     [2] = "DISABLE_CODEGEN=False",
>     [3] = "BATCH_SIZE=0",
>     [4] = "NUM_NODES=0",
>     [5] = "DISABLE_CODEGEN_ROWS_THRESHOLD=0",
>     [6] = "MT_DOP=4",
>     [7] = "ABORT_ON_ERROR=1",
>     [8] = 
> "DEBUG_ACTION=4:GETNEXT:MEM_LIMIT_EXCEEDED|COORD_BEFORE_EXEC_RPC:JITTER@[email protected]",
>     [9] = "EXEC_SINGLE_NODE_ROWS_THRESHOLD=0",
>   },
>   04: hadoop_user (string) = "ubuntu",
> }
> ...
>   74: client_identifier (string) = 
> "failure/test_failpoints.py::TestFailpoints::()::test_failpoints[protocol:beeswax|table_format:seq/snap/block|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_sing",
> ...
> I0830 06:54:49.173739 11329 Frontend.java:1877] 
> 1e4a0f56622f2a15:51c2970900000000] Analyzing query: SELECT STRAIGHT_JOIN *
>            FROM alltypes t1
>                   JOIN /*+broadcast*/ alltypesagg t2 ON t1.id = t2.id
>            WHERE t2.int_col < 1000 db: functional_seq_snap {noformat}
> The client_identifier shows it's TestFailpoints::test_failpoints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (IMPALA-11542) TestFailpoints::test_failpoints crash in ARM build

Reply via email to