[
https://issues.apache.org/jira/browse/HDDS-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951771#comment-16951771
]
Marton Elek commented on HDDS-2308:
-----------------------------------
This happens during the unit tests. I am not 100% sure what is the main
problem, but this is from the crash log:
{code:java}
Register to memory mapping:RAX=0x0000000000000000 is an unknown value
RBX=0x000055ca33aa4240 is an unknown value
RCX=0x0000000000000000 is an unknown value
RDX=0x0000000000000000 is an unknown value
RSP=0x00007ffaad96c520 is an unknown value
RBP=0x00007ffaad96c580 is an unknown value
RSI=0x0000000000000000 is an unknown value
RDI=0x00007ffaca5ef14c: __realloc_dep+0x3bc in /lib/ld-musl-x86_64.so.1 at
0x00007ffaca361000
R8 =0x0000000000012000 is an unknown value
R9 =0x000000000000000c is an unknown value
R10=0x0000000000000008 is an unknown value
R11=0x0000000000000202 is an unknown value
R12=0x0000000000000000 is an unknown value
R13=0x00007ffaad96c547 is an unknown value
R14=0x000055ca324adcb8 is a global jni handle
R15=0x000055ca33ab5720 is an unknown value
Stack: [0x00007ffaad958000,0x00007ffaad96c760], sp=0x00007ffaad96c520, free
space=81k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [librocksdbjni7051281618959419092.so+0x1861c6]
rocksdb::LoggerJniCallback::Logv(rocksdb::InfoLogLevel, char const*,
__va_list_tag*)+0x76
C [librocksdbjni7051281618959419092.so+0x35f984]
rocksdb::Log(rocksdb::InfoLogLevel, std::shared_ptr<rocksdb::Logger> const&,
char const*, ...)+0x84
C [librocksdbjni7051281618959419092.so+0x23f780]
rocksdb::DBImpl::DumpStats()+0x170
C [librocksdbjni7051281618959419092.so+0x256f11]
std::thread::_State_impl<std::thread::_Invoker<std::tuple<rocksdb::RepeatableThread::RepeatableThread(std::function<void
()>, std::string const&, rocksdb::Env*, unsigned long, unsigned
long)::{lambda()#1}> > >::_M_run()+0x71
{code}
So, it seems to be related to the JNI call which calls back to the java.
I tested it locally with the same alpine container and it worked well but
failed on the k8s based build nodes.
I tested with centos + kubernetes and worked well...
> Switch to centos with the apache/ozone-build docker image
> ---------------------------------------------------------
>
> Key: HDDS-2308
> URL: https://issues.apache.org/jira/browse/HDDS-2308
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: Marton Elek
> Assignee: Marton Elek
> Priority: Major
> Attachments: hs_err_pid16346.log
>
>
> I realized multiple JVM crashes in the daily builds:
>
> {code:java}
> ERROR] ExecutionException The forked VM terminated without properly saying
> goodbye. VM crash or System.exit called?
>
>
> [ERROR] Command was /bin/sh -c cd /workdir/hadoop-ozone/ozonefs &&
> /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xmx2048m
> -XX:+HeapDumpOnOutOfMemoryError -jar
> /workdir/hadoop-ozone/ozonefs/target/surefire/surefirebooter9018689154779946208.jar
> /workdir/hadoop-ozone/ozonefs/target/surefire
> 2019-10-06T14-52-40_697-jvmRun1 surefire7569723928289175829tmp
> surefire_947955725320624341206tmp
>
>
> [ERROR] Error occurred in starting fork, check output in log
>
>
> [ERROR] Process Exit Code: 139
>
>
> [ERROR] Crashed tests:
>
>
> [ERROR] org.apache.hadoop.fs.ozone.contract.ITestOzoneContractRename
>
>
> [ERROR] ExecutionException The forked VM terminated without properly
> saying goodbye. VM crash or System.exit called?
>
>
> [ERROR] Command was /bin/sh -c cd /workdir/hadoop-ozone/ozonefs &&
> /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xmx2048m
> -XX:+HeapDumpOnOutOfMemoryError -jar
> /workdir/hadoop-ozone/ozonefs/target/surefire/surefirebooter5429192218879128313.jar
> /workdir/hadoop-ozone/ozonefs/target/surefire
> 2019-10-06T14-52-40_697-jvmRun1 surefire7227403571189445391tmp
> surefire_1011197392458143645283tmp
>
>
> [ERROR] Error occurred in starting fork, check output in log
>
>
> [ERROR] Process Exit Code: 139
>
>
> [ERROR] Crashed tests:
>
>
> [ERROR] org.apache.hadoop.fs.ozone.contract.ITestOzoneContractDistCp
>
>
> [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException:
> ExecutionException The forked VM terminated without properly saying goodbye.
> VM crash or System.exit called?
>
>
> [ERROR] Command was /bin/sh -c cd /workdir/hadoop-ozone/ozonefs &&
> /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xmx2048m
> -XX:+HeapDumpOnOutOfMemoryError -jar
> /workdir/hadoop-ozone/ozonefs/target/surefire/surefirebooter1355604543311368443.jar
> /workdir/hadoop-ozone/ozonefs/target/surefire
> 2019-10-06T14-52-40_697-jvmRun1 surefire3938612864214747736tmp
> surefire_933162535733309260236tmp
>
>
> [ERROR] Error occurred in starting fork, check output in log
>
>
> [ERROR] Process Exit Code: 139
>
>
> [ERROR] ExecutionException The forked VM terminated without properly
> saying goodbye. VM crash or System.exit called?
>
>
> [ERROR] Command was /bin/sh -c cd /workdir/hadoop-ozone/ozonefs &&
> /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -Xmx2048m
> -XX:+HeapDumpOnOutOfMemoryError -jar
> /workdir/hadoop-ozone/ozonefs/target/surefire/surefirebooter9018689154779946208.jar
> /workdir/hadoop-ozone/ozonefs/target/surefire
> 2019-10-06T14-52-40_697-jvmRun1 surefire7569723928289175829tmp
> surefire_947955725320624341206tmp
>
>
> [ERROR] Error occurred in starting fork, check output in log
>
>
> [ERROR] Process Exit Code: 139 {code}
>
> Based on the crash log (uploaded) it's related to the rocksdb JNI interface.
> In the current ozone-build docker image (which provides the environment for
> build) we use alpine where musl libc is used instead of the main glibc. I
> think it would be more safe to use the same glibc what is used in production.
> I tested with centos based docker image and it seems to be more stable.
> Didn't see any more JVM crashes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]