[ https://issues.apache.org/jira/browse/MESOS-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marc Villacorta updated MESOS-6486: ----------------------------------- Comment: was deleted (was: What do you think? Is this a problem with _libjvm.so_ or perhaps a JNI problem in _libmesos-1.0.1.so_?) > Mesos on Alpine Linux: JVM Segmentation fault > --------------------------------------------- > > Key: MESOS-6486 > URL: https://issues.apache.org/jira/browse/MESOS-6486 > Project: Mesos > Issue Type: Wish > Affects Versions: 1.0.1 > Environment: *Docker* > {code:none} > ➜ ~ docker version > Client: > Version: 1.12.1 > API version: 1.24 > Go version: go1.7.1 > Git commit: 6f9534c > Built: Thu Sep 8 10:31:18 2016 > OS/Arch: darwin/amd64 > Server: > Version: 1.12.1 > API version: 1.24 > Go version: go1.6.3 > Git commit: 23cf638 > Built: Thu Aug 18 17:52:38 2016 > OS/Arch: linux/amd64 > {code} > *Alpine* > {code:none} > --------------- S Y S T E M --------------- > OS:NAME="Alpine Linux" > ID=alpine > VERSION_ID=3.4.4 > PRETTY_NAME="Alpine Linux v3.4" > HOME_URL="http://alpinelinux.org" > BUG_REPORT_URL="http://bugs.alpinelinux.org" > uname:Linux 4.4.20-moby #1 SMP Thu Sep 15 12:10:20 UTC 2016 x86_64 > libc:glibc 2.9 NPTL > rlimit: STACK 8192k, CORE infinity, NPROC infinity, NOFILE 1048576, AS > infinity > load average:0.01 0.39 0.89 > {code} > *Java* > {code:none} > # JRE version: OpenJDK Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 > compressed oops) > # Derivative: IcedTea 3.1.0 > # Distribution: Custom build (Tue Aug 30 20:38:19 GMT 2016) > {code} > Reporter: Marc Villacorta > Priority: Minor > Attachments: hs_err_pid1677.log > > > I have compiled Mesos 1.0.1 inside a Docker container using Alpine Linux > (Dockerfile below): > {code:none} > # Set the base image for subsequent instructions: > FROM alpine:3.4 > MAINTAINER Marc Villacorta Morera <marc.villaco...@gmail.com> > # Environment variables: > ENV TAG="1.0.1" \ > PREFIX="/usr/local" \ > JAVA_HOME="/usr/lib/jvm/default-jvm" \ > > JAVA_JVM_LIBRARY="/usr/lib/jvm/default-jvm/jre/lib/amd64/server/libjvm.so" \ > LD_LIBRARY_PATH="/usr/lib/jvm/default-jvm/jre/lib/amd64/server" \ > EDGE_REPO="http://nl.alpinelinux.org/alpine/edge" > # Install mesos: > RUN apk add -U --no-cache -t dev git autoconf automake libtool g++ \ > zlib-dev fts-dev apr-dev curl-dev file cyrus-sasl-dev cyrus-sasl-crammd5 \ > subversion-dev make patch linux-headers binutils && apk add -U --no-cache > \ > -t dev openjdk8 maven --repository ${EDGE_REPO}/community && apk add -U \ > --no-cache libstdc++ libgcc subversion-libs libcurl fts zlib coreutils \ > && git clone https://git-wip-us.apache.org/repos/asf/mesos.git && cd > mesos \ > && { [ "${TAG}" != "master" ] && git checkout tags/${TAG} -b ${TAG}; }; \ > ./bootstrap && mkdir build && cd build && ../configure --prefix=${PREFIX} > \ > --disable-dependency-tracking --disable-maintainer-mode --disable-python \ > --enable-optimize --enable-silent-rules \ > && CORES=$(cat /proc/cpuinfo | grep processor | wc -l) \ > && make -j${CORES} && make install && cd && rm -rf /mesos > ${PREFIX}/include \ > && find ${PREFIX} -type f -perm /u=x,g=x,o=x | xargs strip -s > 2>/dev/null; \ > apk del --purge dev && rm -rf /var/cache/apk/* > # Command: > CMD ["/bin/sh"] > {code} > Some tests are failing and my biggest concern is with this one: > {code:none} > make check GTEST_FILTER="ExamplesTest.JavaFramework" > {code} > {code:none} > [==========] Running 1 test from 1 test case. > [----------] Global test environment set-up. > [----------] 1 test from ExamplesTest > [ RUN ] ExamplesTest.JavaFramework > ../../src/tests/script.cpp:80: Failure > Failed > java_framework_test.sh terminated with signal Segmentation fault > [ FAILED ] ExamplesTest.JavaFramework (5655 ms) > [----------] 1 test from ExamplesTest (5656 ms total) > [----------] Global test environment tear-down > [==========] 1 test from 1 test case ran. (5689 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] ExamplesTest.JavaFramework > {code} > An ugly SIGSEGV is dispatched by the kernel. It looks like _libjvm.so_ is the > offending library but I am not sure at all: > {code:none} > I1026 15:19:54.843340 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.843683 1706 replica.cpp:691] Replica received learned notice > for position 7 from @0.0.0.0:0 > I1026 15:19:54.864063 1706 leveldb.cpp:341] Persisting action (690 bytes) to > leveldb took 20.333769ms > I1026 15:19:54.864123 1706 replica.cpp:712] Persisted action at 7 > I1026 15:19:54.864131 1706 replica.cpp:697] Replica learned APPEND action at > position 7 > I1026 15:19:54.864936 1705 registrar.cpp:509] Successfully updated the > 'registry' in 31.458048ms > I1026 15:19:54.864989 1700 log.cpp:596] Attempting to truncate the log to 7 > I1026 15:19:54.865267 1706 coordinator.cpp:348] Coordinator attempting to > write TRUNCATE action at position 8 > I1026 15:19:54.866050 1706 slave.cpp:1095] Registered with master > master@172.17.0.2:37015; given agent ID > 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2 > I1026 15:19:54.866025 1700 master.cpp:4619] Registered agent > 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2 at slave(1)@172.17.0.2:37015 > (2a2f454552b6) with cpus(*):2; mem(*):10240; disk(*):55318; > ports(*):[31000-32000] > I1026 15:19:54.866127 1702 hierarchical.cpp:478] Added agent > 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2 (2a2f454552b6) with cpus(*):2; > mem(*):10240; disk(*):55318; ports(*):[31000-32000] (allocated: ) > I1026 15:19:54.866257 1700 status_update_manager.cpp:181] Resuming sending > status updates > I1026 15:19:54.866878 1706 slave.cpp:1155] Forwarding total oversubscribed > resources > I1026 15:19:54.866969 1706 master.cpp:5002] Received update of agent > 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2 at slave(1)@172.17.0.2:37015 > (2a2f454552b6) with total oversubscribed resources > I1026 15:19:54.867280 1705 hierarchical.cpp:542] Agent > 7d8d36ff-5d82-4e91-aba8-46267acc8536-S2 (2a2f454552b6) updated with > oversubscribed resources (total: cpus(*):2; mem(*):10240; disk(*):55318; > ports(*):[31000-32000], allocated: ) > I1026 15:19:54.867350 1706 replica.cpp:537] Replica received write request > for position 8 from (67)@172.17.0.2:37015 > I1026 15:19:54.876315 1706 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 8.874131ms > I1026 15:19:54.876348 1706 replica.cpp:712] Persisted action at 8 > I1026 15:19:54.876600 1705 replica.cpp:691] Replica received learned notice > for position 8 from @0.0.0.0:0 > I1026 15:19:54.885751 1705 leveldb.cpp:341] Persisting action (18 bytes) to > leveldb took 9.032464ms > I1026 15:19:54.885886 1705 leveldb.cpp:399] Deleting ~2 keys from leveldb > took 39508ns > I1026 15:19:54.885917 1705 replica.cpp:712] Persisted action at 8 > I1026 15:19:54.885938 1705 replica.cpp:697] Replica learned TRUNCATE action > at position 8 > I1026 15:19:55.790892 1705 master.cpp:2424] Received SUBSCRIBE call for > framework 'Test Framework (Java)' at > scheduler-b2956950-fa7e-49c3-88ed-efcef624b837@172.17.0.2:37015 > I1026 15:19:55.791019 1705 master.cpp:2500] Subscribing framework Test > Framework (Java) with checkpointing enabled and capabilities [ ] > I1026 15:19:55.791221 1705 hierarchical.cpp:271] Added framework > 7d8d36ff-5d82-4e91-aba8-46267acc8536-0000 > I1026 15:19:55.791256 1700 sched.cpp:743] Framework registered with > 7d8d36ff-5d82-4e91-aba8-46267acc8536-0000 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007fcc6d6dcc64, pid=1677, tid=0x00007fcc54193ab0 > # > # JRE version: OpenJDK Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 > compressed oops) > # Derivative: IcedTea 3.1.0 > # Distribution: Custom build (Tue Aug 30 20:38:19 GMT 2016) > # Problematic frame: > # C [libjvm.so+0x300c64] > # > # Core dump written. Default location: /mesos/build/src/examples/java/core or > core.1677 > # > # An error report file with more information is saved as: > # /mesos/build/src/examples/java/hs_err_pid1677.log > I1026 15:19:55.792402 1705 master.cpp:5725] Sending 3 offers to framework > 7d8d36ff-5d82-4e91-aba8-46267acc8536-0000 (Test Framework (Java)) at > scheduler-b2956950-fa7e-49c3-88ed-efcef624b837@172.17.0.2:37015 > # > # If you would like to submit a bug report, please include > # instructions on how to reproduce the bug and visit: > # http://icedtea.classpath.org/bugzilla > # > Segmentation fault (core dumped) > {code} > Find attached the _/mesos/build/src/examples/java/hs_err_pid1677.log_ file. > Also here you have a GDB _bt_ (for those who understand it): > {code:none} > warning: Can't read pathname for load map: No error information. > Core was generated by `/usr/lib/jvm/default-jvm/bin/java -cp > /mesos/build/src/java/target/protobuf-jav'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007fcc6e31dd08 in abort () from /lib/ld-musl-x86_64.so.1 > [Current thread is 1 (LWP 1700)] > (gdb) bt > #0 0x00007fcc6e31dd08 in abort () from /lib/ld-musl-x86_64.so.1 > #1 0x00007fcc54192d28 in ?? () > #2 0x00007fcc6d93ac91 in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #3 0x00007fcc6da0947c in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #4 0x00007fcc6d940a40 in JVM_handle_linux_signal () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #5 0x00007fcc6d939b21 in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #6 <signal handler called> > #7 0x00007fcc6d6dcc64 in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #8 0x00007fcc6d7f4a36 in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #9 0x00007fcc6d7f4c4c in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #10 0x00007fcc6d8218c4 in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #11 0x00007fcc5812057a in JNIScheduler::registered(mesos::SchedulerDriver*, > mesos::FrameworkID const&, mesos::MasterInfo const&) () > from /mesos/build/src/.libs/libmesos-1.0.1.so > #12 0x00007fcc577e18df in > mesos::internal::SchedulerProcess::registered(process::UPID const&, > mesos::FrameworkID const&, mesos::MasterInfo const&) () from > /mesos/build/src/.libs/libmesos-1.0.1.so > #13 0x00007fcc577f51e4 in void > ProtobufProcess<mesos::internal::SchedulerProcess>::handler2<mesos::internal::FrameworkRegisteredMessage, > mesos::FrameworkID const&, mesos::FrameworkID const&, mesos::MasterInfo > const&, mesos::MasterInfo const&>(mesos::internal::SchedulerProcess*, void > (mesos::internal::SchedulerProcess::*)(process::UPID const&, > mesos::FrameworkID const&, mesos::MasterInfo const&), mesos::FrameworkID > const& (mesos::internal::FrameworkRegisteredMessage::*)() const, > mesos::MasterInfo const& (mesos::internal::FrameworkRegisteredMessage::*)() > const, process::UPID const&, std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&) () from > /mesos/build/src/.libs/libmesos-1.0.1.so > #14 0x00007fcc577df4aa in std::_Function_handler<void (process::UPID const&, > std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&), std::_Bind<void (*(mesos::internal::SchedulerProcess*, void > (mesos::internal::SchedulerProcess::*)(process::UPID const&, > mesos::FrameworkID const&, mesos::MasterInfo const&), mesos::FrameworkID > const& (mesos::internal::FrameworkRegisteredMessage::*)() const, > mesos::MasterInfo const& (mesos::internal::FrameworkRegisteredMessage::*)() > const, std::_Placeholder<1>, > std::_Placeholder<2>))(mesos::internal::SchedulerProcess*, void > (mesos::internal::SchedulerProcess::*)(process::UPID const&, > mesos::FrameworkID const&, mesos::MasterInfo const&), mesos::FrameworkID > const& (mesos::internal::FrameworkRegisteredMessage::*)() const, > mesos::MasterInfo const& (mesos::internal::FrameworkRegisteredMessage::*)() > const, process::UPID const&, std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&)> > >::_M_invoke(std::_Any_data const&, process::UPID const&, > std::__cxx11::basic_string<---Type ---Type <return> to continue, or q > <return> to quit--- > char, std::char_traits<char>, std::allocator<char> > const&) () from > /mesos/build/src/.libs/libmesos-1.0.1.so > #15 0x00007fcc577e9d0a in > ProtobufProcess<mesos::internal::SchedulerProcess>::visit(process::MessageEvent > const&) () from /mesos/build/src/.libs/libmesos-1.0.1.so > #16 0x00007fcc580cef73 in > process::ProcessManager::resume(process::ProcessBase*) () from > /mesos/build/src/.libs/libmesos-1.0.1.so > #17 0x00007fcc580cf8d7 in > std::thread::_Impl<std::_Bind_simple<process::ProcessManager::init_threads()::{unnamed > type#1} ()> >::_M_run() () from /mesos/build/src/.libs/libmesos-1.0.1.so > #18 0x00007fcc6d147c8a in execute_native_thread_routine () from > /usr/lib/libstdc++.so.6 > #19 0x00007fcc6e35154d in ?? () from /lib/ld-musl-x86_64.so.1 > #20 0x0000000000000000 in ?? () > (gdb) > {code} > ... and the same _bt_ after I installed the _musl-dbg_ package: > {code:none} > warning: Can't read pathname for load map: No error information. > Core was generated by `/usr/lib/jvm/default-jvm/bin/java -cp > /mesos/build/src/java/target/protobuf-jav'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 a_crash () at ./arch/x86_64/atomic_arch.h:108 > 108 ./arch/x86_64/atomic_arch.h: No such file or directory. > [Current thread is 1 (LWP 1700)] > (gdb) bt > #0 a_crash () at ./arch/x86_64/atomic_arch.h:108 > #1 abort () at src/exit/abort.c:11 > #2 0x00007fcc6d93ac91 in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #3 0x00007fcc6da0947c in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #4 0x00007fcc6d940a40 in JVM_handle_linux_signal () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #5 0x00007fcc6d939b21 in ?? () from > /usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server/libjvm.so > #6 0x00007fcc6e345d04 in sigwaitinfo (mask=<optimized out>, si=<optimized > out>) at src/signal/sigwaitinfo.c:5 > #7 0x0000000000000001 in ?? () > #8 0x0000000000000000 in ?? () > (gdb) > {code} > I have tested with _openjdk7_ and _openjdk8_ (3.4.4 and edge) with no luck. -- This message was sent by Atlassian JIRA (v6.3.4#6332)