[ 
https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389882#comment-17389882
 ] 

Martin Tzvetanov Grigorov commented on MESOS-10226:
---------------------------------------------------

Attached [^gdb-thread-apply-bt-all-29.07.2021.txt]

> test suite hangs on ARM64
> -------------------------
>
>                 Key: MESOS-10226
>                 URL: https://issues.apache.org/jira/browse/MESOS-10226
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Charles Natali
>            Assignee: Charles Natali
>            Priority: Major
>         Attachments: gdb-thread-apply-bt-all-29.07.2021.txt
>
>
> Reported by [~mgrigorov].
>  
> {noformat}
> [ RUN      ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace
> sh: 1: hadoop: not found
> Marked '/' as rslave
> I0726 11:59:17.812630    32 exec.cpp:164] Version: 1.12.0
> I0726 11:59:17.827512    31 exec.cpp:237] Executor registered on agent 
> 9076f44b-846d-4f00-a2dc-11f694cc1900-S0
> I0726 11:59:17.830999    36 executor.cpp:190] Received SUBSCRIBED event
> I0726 11:59:17.832351    36 executor.cpp:194] Subscribed executor on 
> martin-arm64
> I0726 11:59:17.832775    36 executor.cpp:190] Received LAUNCH event
> I0726 11:59:17.834415    36 executor.cpp:722] Starting task 
> d1bbb266-bee7-4c9d-929f-16aa41f4e9cf
> I0726 11:59:17.839910    36 executor.cpp:740] Forked command at 38
> Preparing rootfs at 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791'
> Changing root to 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791
> Failed to execute 'sh': Exec format error
> I0726 11:59:18.113488    33 executor.cpp:1041] Command exited with status 1 
> (pid: 38)
> ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:1111: 
> Failure
> Mock function called more times than expected - returning directly.
>     Function call: statusUpdate(0xffffc28527f0, @0xffffa2cf3a60 136-byte 
> object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 
> 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 
> A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 
> 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 
> 03-00 00-00>)
>          Expected: to be called twice
>            Actual: called 3 times - over-saturated and active
> I0726 11:59:19.117401    37 process.cpp:935] Stopped the socket accept 
> loop{noformat}
>  
> I asked him to provide a gdb traceback and we can see the following:
>  
> {noformat}
> Thread 1 (Thread 0xffffa3bc2c60 (LWP 173475)):
> #0 0x0000ffffa518db20 in __libc_open64 (file=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", oflag=<optimized out>) at 
> ../sysdeps/unix/sysv/linux/open64.c:48
> #1 0x0000ffffa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, 
> filename=<optimized out>, posix_mode=<optimized out>, prot=prot@entry=438, 
> read_write=8, is32not64=<optimized out>) at fileops.c:189
> #2 0x0000ffffa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, 
> filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode=<optimized 
> out>, mode@entry=0xaaaad762f3c8 "r", is32not64=is32not64@e
> ntry=1) at fileops.c:281 
> #3 0x0000ffffa512e0dc in __fopen_internal (filename=0xaaab00f342e0 
> "/tmp/7VXP3w/pipe", mode=0xaaaad762f3c8 "r", is32=1) at iofopen.c:75
> #4 0x0000aaaad54f5350 in os::read (path="/tmp/7VXP3w/pipe") at 
> ../../3rdparty/stout/include/stout/os/read.hpp:136
> #5 0x0000aaaad74f1c1c in 
> mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody
>  (this=0xaaab00f88f50) at ../../src/tests/containeri
> zer/nested_mesos_containerizer_tests.cpp:1126
> {noformat}
>  
>  
> Basically the test uses a named pipe to synchronize with the task being 
> started, and if the task fails to start - in this case because we're trying 
> to launch an x86 container on an arm64 host - the test will just hang reading 
> from the pipe.
> I send Martin a tentative fix for him to test, and I'll open an MR if 
> successful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to