Shuo-Jia edited a comment on issue #659:
URL: 
https://github.com/apache/incubator-pegasus/issues/659#issuecomment-817468125


   I search all previous core log on cluster, I found it has occured as early 
as [v1.12.1](https://github.com/apache/incubator-pegasus/tree/v1.12.1) and 
[1.12.3](https://github.com/apache/incubator-pegasus/tree/v1.12.3) 
   ```
   Reading symbols from 
/home/work/packages/pegasus/c3srv-browser/a948e89b180b6a5c82d298d0dcc65f7bb770a8be-20200421-120054/pegasus-server-1.12.3-a948e89-glibc2.12-release/bin/pegasus_server...done.
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-browser/replica/package/bin/libaio.so.1
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/7e/f475c5abcc899058c9ab6600a1b65fb1343153.debug
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-browser/replica/package/bin/libcrypto.so.10
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/3a/8d65b9a373c0afaf106f3a979835b16dbeff1a.debug
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-browser/replica/package/bin/libsnappy.so.1
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/e7/f2d4d5e290fe830bfca479729099b0e6adbfd3.debug
   [Thread debugging using libthread_db enabled]
   Using host libthread_db library "/lib64/libthread_db.so.1".
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-browser/replica/package/bin/liblz4.so.1
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/a5/7db374840a9697b5aa9e25a8967b1ac75bfb70.debug
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-browser/replica/package/bin/libzstd.so.1
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/6e/19fb6f4ba75c095047c9b9f564e69d050332c9.debug
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-browser/replica/package/bin/libssl.so.10
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/31/8eab33420b000d542f09b91b716bacab1ad546.debug
   Core was generated by 
`/home/work/app/pegasus/c3srv-browser/replica/package/bin/pegasus_server 
config.'.
   Program terminated with signal 6, Aborted.
   #0  0x00007f30859951d7 in raise () from /lib64/libc.so.6
   Missing separate debuginfos, use: debuginfo-install 
glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 
krb5-libs-1.14.1-27.el7_3.x86_64 libcom_err-1.42.9-9.el7.x86_64 
libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 
pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64
   (gdb) #0  0x00007f30859951d7 in raise () from /lib64/libc.so.6
   #1  0x00007f30859968c8 in abort () from /lib64/libc.so.6
   #2  0x00007f3089941e7e in dsn_coredump ()
       at /home/wutao1/pegasus-release/rdsn/src/core/core/service_api_c.cpp:76
   #3  0x00007f308998cafd in dsn::disk_file::on_write_completed (
       this=<optimized out>, wk=wk@entry=0xabdb90310,
       ctx=ctx@entry=0x7f30830aa17c, err=..., size=size@entry=331776)
       at /home/wutao1/pegasus-release/rdsn/src/core/core/disk_engine.cpp:111
   #4  0x00007f308998d332 in dsn::disk_engine::complete_io (this=0x20f4900,
       aio=0xabdb90310, err=..., bytes=bytes@entry=331776,
       delay_milliseconds=delay_milliseconds@entry=0)
       at /home/wutao1/pegasus-release/rdsn/src/core/core/disk_engine.cpp:338
   #5  0x00007f30899e323e in dsn::aio_provider::complete_io (
       this=<optimized out>, aio=<optimized out>, err=...,
       bytes=bytes@entry=331776, delay_milliseconds=delay_milliseconds@entry=0)
       at /home/wutao1/pegasus-release/rdsn/src/core/core/aio_provider.cpp:50
   #6  0x00007f30899db629 in dsn::tools::native_linux_aio_provider::complete_aio
       (this=this@entry=0x21149f0, io=0xe618cd20, bytes=331776,
       err=<optimized out>)
       at 
/home/wutao1/pegasus-release/rdsn/src/core/tools/common/native_aio_provider.linux.cpp:146
   #7  0x00007f30899db781 in dsn::tools::native_linux_aio_provider::get_event (
       this=0x21149f0)
       at 
/home/wutao1/pegasus-release/rdsn/src/core/tools/common/native_aio_provider.linux.cpp:124
   #8  0x00007f30862ed600 in std::(anonymous 
namespace)::execute_native_thread_routine (__p=<optimized out>)
       at 
/home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
   #9  0x00007f3086dffdc5 in start_thread () from /lib64/libpthread.so.0
   #10 0x00007f3085a5773d in clone () from /lib64/libc.so.6
   ```
   
   ```
   Reading symbols from 
/home/work/packages/pegasus/c3srv-xiaomi/694cbd544436f03d34bfbfcbca0cd9b8f397197e-20191204-103256/pegasus-server-1.12.1-694cbd5-glibc2.12-release/bin/pegasus_server...done.
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-xiaomi/replica/package/bin/libaio.so.1
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/7e/f475c5abcc899058c9ab6600a1b65fb1343153.debug
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-xiaomi/replica/package/bin/libcrypto.so.10
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/3a/8d65b9a373c0afaf106f3a979835b16dbeff1a.debug
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-xiaomi/replica/package/bin/libsnappy.so.1
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/e7/f2d4d5e290fe830bfca479729099b0e6adbfd3.debug
   [Thread debugging using libthread_db enabled]
   Using host libthread_db library "/lib64/libthread_db.so.1".
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-xiaomi/replica/package/bin/liblz4.so.1
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/a5/7db374840a9697b5aa9e25a8967b1ac75bfb70.debug
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-xiaomi/replica/package/bin/libzstd.so.1
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/6e/19fb6f4ba75c095047c9b9f564e69d050332c9.debug
   Missing separate debuginfo for 
/home/work/app/pegasus/c3srv-xiaomi/replica/package/bin/libssl.so.10
   Try: yum --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/31/8eab33420b000d542f09b91b716bacab1ad546.debug
   Core was generated by 
`/home/work/app/pegasus/c3srv-xiaomi/replica/package/bin/pegasus_server 
config.i'.
   Program terminated with signal 6, Aborted.
   #0  0x00007ffce20041d7 in raise () from /lib64/libc.so.6
   Missing separate debuginfos, use: debuginfo-install 
bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 
elfutils-libelf-0.166-2.el7.x86_64 elfutils-libs-0.166-2.el7.x86_64 
glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 
krb5-libs-1.14.1-27.el7_3.x86_64 libattr-2.4.46-12.el7.x86_64 
libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 
libcurl-7.29.0-35.el7.centos.x86_64 libgcc-4.8.5-11.el7.x86_64 
libidn-1.28-4.el7.x86_64 libselinux-2.5-6.el7.x86_64 
libssh2-1.4.3-10.el7_2.1.x86_64 nspr-4.11.0-1.el7_2.x86_64 
nss-3.21.0-17.el7.x86_64 nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 
nss-util-3.21.0-2.2.el7_2.x86_64 openldap-2.4.40-13.el7.x86_64 
pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-30.el7.x86_64 
xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
   (gdb) #0  0x00007ffce20041d7 in raise () from /lib64/libc.so.6
   #1  0x00007ffce20058c8 in abort () from /lib64/libc.so.6
   #2  0x00007ffce5f947be in dsn_coredump ()
       at /home/wutao1/pegasus-release/rdsn/src/core/core/service_api_c.cpp:76
   #3  0x00007ffce5fddebd in dsn::disk_file::on_write_completed (
       this=<optimized out>, wk=wk@entry=0xa4122357c,
       ctx=ctx@entry=0x7ffcdd5ea17c, err=..., size=size@entry=342)
       at /home/wutao1/pegasus-release/rdsn/src/core/core/disk_engine.cpp:111
   #4  0x00007ffce5fde6f2 in dsn::disk_engine::complete_io (this=0x2962260,
       aio=0xa4122357c, err=..., bytes=bytes@entry=342,
       delay_milliseconds=delay_milliseconds@entry=0)
       at /home/wutao1/pegasus-release/rdsn/src/core/core/disk_engine.cpp:338
   #5  0x00007ffce603944e in dsn::aio_provider::complete_io (
       this=<optimized out>, aio=<optimized out>, err=...,
       bytes=bytes@entry=342, delay_milliseconds=delay_milliseconds@entry=0)
       at /home/wutao1/pegasus-release/rdsn/src/core/core/aio_provider.cpp:50
   #6  0x00007ffce6031619 in dsn::tools::native_linux_aio_provider::complete_aio
       (this=this@entry=0x296a720, io=0x103d96a20, bytes=342,
       err=<optimized out>)
       at 
/home/wutao1/pegasus-release/rdsn/src/core/tools/common/native_aio_provider.linux.cpp:146
   #7  0x00007ffce6031771 in dsn::tools::native_linux_aio_provider::get_event (
       this=0x296a720)
       at 
/home/wutao1/pegasus-release/rdsn/src/core/tools/common/native_aio_provider.linux.cpp:124
   #8  0x00007ffce295c600 in std::(anonymous 
namespace)::execute_native_thread_routine (__p=<optimized out>)
       at 
/home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
   #9  0x00007ffce34c1dc5 in start_thread () from /lib64/libpthread.so.0
   #10 0x00007ffce20c673d in clone () from /lib64/libc.so.6
   ```
   So I think this bug is not caused by the Pegasus program itself, but the 
`disk interface` actually don't ensure the `write` is 
    reliable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to