ninsmiracle commented on issue #2007:
URL: 
https://github.com/apache/incubator-pegasus/issues/2007#issuecomment-2105522984

   Let me add more details:
   1. deploy clusters,it works. Every nodes running...
   2. useing peagsus-shell to connected to cluster
   
![image](https://github.com/apache/incubator-pegasus/assets/110282526/f72de404-5e30-4e33-88b8-c6b173c4b5aa)
   
   3. send any RPC  command , like `nodes -dr` or `ls -d`. TIME_OUT
   
![image](https://github.com/apache/incubator-pegasus/assets/110282526/0bd9373d-1541-43b3-ac68-cb6975dba1e9)
   
   4.A lot of core in meta-server
   
![image](https://github.com/apache/incubator-pegasus/assets/110282526/07c1c1f8-7d11-4b13-a8c9-21b2a882a733)
    
   Core like `core.meta.THREAD_PO...`
   ```
   Using host libthread_db library "/lib64/libthread_db.so.1".
   Core was generated by 
`/home/work/app/pegasus/c3tst-performance1/meta/package/bin/pegasus_server 
confi'.
   Program terminated with signal SIGABRT, Aborted.
   #0  0x00007f3c0c8bc1d7 in raise () from /lib64/libc.so.6
   (gdb) bt
   #0  0x00007f3c0c8bc1d7 in raise () from /lib64/libc.so.6
   #1  0x00007f3c0c8bd8c8 in abort () from /lib64/libc.so.6
   #2  0x00007f3c10cd8a1e in dsn_coredump () at 
/home/guoningshen/code/incubator-pegasus/src/runtime/service_api_c.cpp:130
   #3  0x00007f3c0dcb4134 in process_fatal_log (log_level=<optimized out>) at 
/home/guoningshen/code/incubator-pegasus/src/utils/simple_logger.cpp:117
   #4  dsn::tools::simple_logger::log (this=0x2e3a200, file=<optimized out>, 
function=<optimized out>, line=<optimized out>, log_level=<optimized out>, 
str=<optimized out>)
       at 
/home/guoningshen/code/incubator-pegasus/src/utils/simple_logger.cpp:284
   #5  0x00007f3c10d09ff3 in dsn::host_port::from_address (addr=...) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/rpc/rpc_host_port.cpp:60
   #6  0x00007f3c10d0f0c5 in dsn::message_ex::create_response 
(this=this@entry=0x327be00) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/rpc/rpc_message.cpp:358
   #7  0x00007f3c10d0638d in dsn::rpc_engine::forward 
(this=this@entry=0x2c4f180, request=request@entry=0x327be00, address=...) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/rpc/rpc_engine.cpp:853
   #8  0x00007f3c10cd90a3 in dsn_rpc_forward (request=0x327be00, addr=...) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/service_api_c.cpp:207
   #9  0x00007f3c0ffc6196 in forward (addr=..., this=0x7f3bee4e5f20) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/rpc/rpc_holder.h:224
   #10 
dsn::replication::meta_service::check_leader<dsn::rpc_holder<dsn::replication::configuration_list_apps_request,
 dsn::replication::configuration_list_apps_response> > 
(this=this@entry=0x32ee000, 
       rpc=..., forward_address=<optimized out>) at 
/home/guoningshen/code/incubator-pegasus/src/meta/meta_service.h:406
   #11 0x00007f3c0ffc629a in 
dsn::replication::meta_service::check_leader_status<dsn::rpc_holder<dsn::replication::configuration_list_apps_request,
 dsn::replication::configuration_list_apps_response> > (
       this=this@entry=0x32ee000, rpc=..., 
forward_address=forward_address@entry=0x0) at 
/home/guoningshen/code/incubator-pegasus/src/meta/meta_service.h:420
   #12 0x00007f3c0ff9ef6a in dsn::replication::meta_service::on_list_apps 
(this=0x32ee000, rpc=...) at 
/home/guoningshen/code/incubator-pegasus/src/meta/meta_service.cpp:671
   #13 0x00007f3c0fff8653 in operator() (request=<optimized out>, 
__closure=<optimized out>) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/serverlet.h:201
   #14 std::_Function_handler<void (dsn::message_ex*), bool 
dsn::serverlet<dsn::replication::meta_service>::register_rpc_handler_with_rpc_holder<dsn::rpc_holder<dsn::replication::configuration_list_apps_request,
 dsn::replication::configuration_list_apps_response> >(dsn::task_code, char 
const*, void 
(dsn::replication::meta_service::*)(dsn::rpc_holder<dsn::replication::configuration_list_apps_request,
 
dsn::replication::configuration_list_apps_response>))::{lambda(dsn::message_ex*)#1}>::_M_invoke(std::_Any_data
 const&, dsn::message_ex*&&) (__functor=..., __args#0=<optimized out>)
       at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316
   #15 0x00007f3c10d123b2 in operator() (__args#0=<optimized out>, 
this=0x2b310d0) at 
/opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706
   #16 dsn::rpc_request_task::exec (this=0x2b31000) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/task/task.h:436
   #17 0x00007f3c10d13be1 in dsn::task::exec_internal (this=0x2b31000) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/task/task.cpp:173
   #18 0x00007f3c10d2b257 in dsn::task_worker::loop (this=0x2b19290) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/task/task_worker.cpp:245
   #19 0x00007f3c10d2bdc0 in dsn::task_worker::run_internal (this=0x2b19290) at 
/home/guoningshen/code/incubator-pegasus/src/runtime/task/task_worker.cpp:225
   #20 0x00007f3c0f7a5a3f in execute_native_thread_routine () from 
/home/work/app/pegasus/c3tst-performance1/meta/package/bin/librocksdb.so.8
   #21 0x00007f3c0df3adc5 in start_thread () from /lib64/libpthread.so.0
   #22 0x00007f3c0c97e73d in clone () from /lib64/libc.so.6
   (gdb) 
   ```
   
   Core like `core.pegasus_server....`
   ```
   #0  0x0000000000000000 in ?? ()
   #1  0x00007f693f83b6c0 in (anonymous 
namespace)::stacktrace_generic_fp::capture<false, false> 
(result=result@entry=0xaee010, max_depth=31, skip_count=1, 
initial_frame=initial_frame@entry=0x7ffd328eae80, 
       initial_pc=initial_pc@entry=0x0, sizes=0x0) at 
src/stacktrace_generic_fp-inl.h:175
   #2  0x00007f693f83b74a in GetStackTrace_generic_fp (result=0xaee010, 
max_depth=<optimized out>, skip_count=<optimized out>) at 
src/stacktrace_generic_fp-inl.h:332
   #3  0x00007f693f83ba52 in GetStackTrace (result=result@entry=0xaee010, 
max_depth=max_depth@entry=30, skip_count=skip_count@entry=0) at 
src/stacktrace.cc:346
   #4  0x00007f693f82c37e in tcmalloc::PageHeap::HandleUnlock 
(this=0x7f693fa56720 <tcmalloc::Static::pageheap_>, context=0x7ffd328eaf10) at 
src/page_heap.cc:155
   #5  0x00007f693f82e07a in ~LockingContext (this=0x7ffd328eaf10, 
__in_chrg=<optimized out>) at src/page_heap.cc:77
   #6  tcmalloc::PageHeap::NewWithSizeClass (this=this@entry=0x7f693fa56720 
<tcmalloc::Static::pageheap_>, n=n@entry=1, sizeclass=26) at 
src/page_heap.cc:161
   #7  0x00007f693f82beb7 in tcmalloc::CentralFreeList::Populate 
(this=this@entry=0x7f693fbe1420 <tcmalloc::Static::central_cache_+31616>) at 
src/central_freelist.cc:314
   #8  0x00007f693f82c088 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe 
(this=0x7f693fbe1420 <tcmalloc::Static::central_cache_+31616>, N=1, 
start=0x7ffd328eb020, end=0x7ffd328eb028)
       at src/central_freelist.cc:273
   #9  0x00007f693f82c120 in tcmalloc::CentralFreeList::RemoveRange 
(this=0x7f693fbe1420 <tcmalloc::Static::central_cache_+31616>, 
start=start@entry=0x7ffd328eb020, end=end@entry=0x7ffd328eb028, N=1)
       at src/central_freelist.cc:253
   #10 0x00007f693f82fca3 in tcmalloc::ThreadCache::FetchFromCentralCache 
(this=this@entry=0xb0e000, cl=cl@entry=26, byte_size=byte_size@entry=576, 
       oom_handler=oom_handler@entry=0x7f693f81d240 <(anonymous 
namespace)::nop_oom_handler(size_t)>) at src/thread_cache.cc:125
   #11 0x00007f693f83f15d in Allocate (oom_handler=0x7f693f81d240 <(anonymous 
namespace)::nop_oom_handler(size_t)>, cl=26, size=576, this=<optimized out>) at 
src/thread_cache.h:381
   #12 do_malloc (size=568) at src/tcmalloc.cc:1414
   #13 do_allocate_full<tcmalloc::malloc_oom> (size=568) at src/tcmalloc.cc:1804
   #14 tcmalloc::allocate_full_malloc_oom (size=568) at src/tcmalloc.cc:1820
   #15 0x00007f693dfa754d in __fopen_internal () from /lib64/libc.so.6
   #16 0x00007f693ca60a16 in selinuxfs_exists () from /lib64/libselinux.so.1
   #17 0x00007f693ca58ce8 in init_lib () from /lib64/libselinux.so.1
   #18 0x00007f6943dfd1e3 in _dl_init_internal () from 
/lib64/ld-linux-x86-64.so.2
   #19 0x00007f6943def21a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
   #20 0x0000000000000004 in ?? ()
   #21 0x00007ffd328ed220 in ?? ()
   #22 0x00007ffd328ed26a in ?? ()
   #23 0x00007ffd328ed275 in ?? ()
   #24 0x00007ffd328ed27f in ?? ()
   #25 0x0000000000000000 in ?? ()
   (gdb) 
   ```
   
   
   6. stdout(error log) in meta-server
   ```
   W2024-05-11 10:33:36.503 (1715394816503732375 36348) : overwrite default 
thread pool for task RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX from 
THREAD_POOL_META_SERVER to THREAD_POOL_DEFAULT
   W2024-05-11 10:33:36.503 (1715394816503775340 36348) : overwrite default 
thread pool for task RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX_ACK from 
THREAD_POOL_META_SERVER to THREAD_POOL_DEFAULT
   I2024-05-11 10:33:36.503 (1715394816503863057 36348) : pegasus server 
starting, pid(36348), version($Version: Pegasus Server 2.6.0-SNAPSHOT 
(aea1cfe632d455fcddfe4c92ebbd9d4e89037abb) Release, built by gcc 7.3.1, built 
on 12180ab51819, built at May  7 2024 12:14:31 $)
   F2024-05-11 10:36:03.558 (1715394963558260142 36428)   
meta.THREAD_POOL_META_SERVER2.02008e370001000c: 
rpc_host_port.cpp:62:from_address(): assertion expression: 
[utils::hostname_from_ip(__bswap_32 (addr.ip()), &hp._host)] invalid host_port 
172.17.0.1
   ```
   
   7.By the way , all the replica-server running during that time
   
![image](https://github.com/apache/incubator-pegasus/assets/110282526/34f7a93b-8302-4137-a0a3-ef9c7b56e63e)
   
   
   8.And I can not connect to cluster via `admin-cli`
   
![image](https://github.com/apache/incubator-pegasus/assets/110282526/b5ac7374-b5e5-41c5-9a92-3532b2a4a74a)
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to