joeylichang opened a new issue, #600: URL: https://github.com/apache/incubator-kvrocks/issues/600
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-kvrocks/issues) and found no similar issues. ### Version kvrocks 2.0 macos 11.3.1 (20E241) Darwin 20.4.0 ### Minimal reproduce step 1. write tool auto migrate data by slot 2. migrating data and check migrating status by cluster info command 3. the importing node run a while, then received signal 11 ``` W0521 09:00:24.454731 134987776 main.cc:85] ======= Ooops! kvrocks 2.0.6 got signal: 11 ======= W0521 09:00:24.475903 134987776 main.cc:93] 1 kvrocks_s1 0x000000010a70930a _ZNKSt3__110__function12__value_funcIFviEEclEOi + 74 W0521 09:00:24.509127 134987776 main.cc:93] 2 ??? 0x0000000000000000 0x0 + 0 W0521 09:00:24.528713 134987776 main.cc:93] 3 kvrocks_s1 0x000000010a700ebf _ZNKSt3__18functionIFviEEclEi + 47 W0521 09:00:24.547070 134987776 main.cc:93] 4 kvrocks_s1 0x000000010a700e18 _ZN5Redis10Connection5CloseEv + 360 W0521 09:00:24.548506 134987776 main.cc:93] 5 kvrocks_s1 0x000000010a702ce7 _ZN5Redis10Connection7OnEventEP11buffereventsPv + 871 W0521 09:00:24.549592 134987776 main.cc:93] 6 kvrocks_s1 0x000000010ad4fa94 bufferevent_run_deferred_callbacks_unlocked + 468 W0521 09:00:24.550586 134987776 main.cc:93] 7 kvrocks_s1 0x000000010ad5b6c5 event_process_active_single_queue + 1093 W0521 09:00:24.551573 134987776 main.cc:93] 8 kvrocks_s1 0x000000010ad581ff event_base_loop + 1775 W0521 09:00:24.552402 134987776 main.cc:93] 9 kvrocks_s1 0x000000010a710da5 _ZN6Worker3RunENSt3__111__thread_idE + 37 W0521 09:00:24.553221 134987776 main.cc:93] 10 kvrocks_s1 0x000000010a71ba65 _ZZN12WorkerThread5StartEvENK3$_0clEv + 69 W0521 09:00:24.553964 134987776 main.cc:93] 11 kvrocks_s1 0x000000010a71b9cd _ZNSt3__1L8__invokeIZN12WorkerThread5StartEvE3$_0JEEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOS3_DpOS4_ + 29 W0521 09:00:24.554687 134987776 main.cc:93] 12 kvrocks_s1 0x000000010a71b965 _ZNSt3__1L16__thread_executeINS_10unique_ptrINS_15__thread_structENS_14default_deleteIS2_EEEEZN12WorkerThread5StartEvE3$_0JEJEEEvRNS_5tupleIJT_T0_DpT1_EEENS_15__tuple_indicesIJXspT2_EEEE + 37 W0521 09:00:24.555402 134987776 main.cc:93] 13 kvrocks_s1 0x000000010a71b202 _ZNSt3__1L14__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEEZN12WorkerThread5StartEvE3$_0EEEEEPvSA_ + 98 W0521 09:00:24.556113 134987776 main.cc:93] 14 libsystem_pthread.dylib 0x00007fff205f8954 _pthread_start + 224 W0521 09:00:24.556841 134987776 main.cc:93] 15 libsystem_pthread.dylib 0x00007fff205f44a7 thread_start + 15 ``` debug trace call path 1. cluster import command build connectin bind close_cb_ whith StopForLinkError 2. when connectin fd receive 'BEV_EVENT_EOF' event, will call close_cb_ 3. then received signal 11 ``` Status Cluster::ImportSlot(Redis::Connection *conn, int slot, int state) { .... switch (state) { case kImportStart: if (!svr_->slot_import_->Start(conn->GetFD(), slot)) { return Status(Status::NotOK, "Can't start importing slot " + std::to_string(slot)); } // Set link importing conn->SetImporting(); myself_->importing_slot_ = slot; // Set link error callback conn->close_cb_ = std::bind(&SlotImport::StopForLinkError, svr_->slot_import_, conn->GetFD()); } ..... } // libevent return 'BEV_EVENT_EOF' event , call close void Connection::Close() { owner_->FreeConnection(this); if (close_cb_) { close_cb_(GetFD()); } } // do not enter close_cb_, but at close_cb_ line receive signal 11 close_cb_ bind StopForLinkError void SlotImport::StopForLinkError(int fd) { std::lock_guard<std::mutex> guard(mutex_); if (import_status_ != kImportStart) return; if (!svr_->IsSlave()) { // Clean imported slot data LOG(ERROR) << "SlotImport 4444444444"; auto s = ClearKeysOfSlot(namespace_, import_slot_); if (!s.ok()) { LOG(ERROR) << "[import] Failed to clear keys of slot " << import_slot_ << " Current status is link error" << ", Err: " << s.ToString(); } } LOG(ERROR) << "[import] Stop importing for link error, slot: " << import_slot_; import_status_ = kImportFailed; import_fd_ = -1; } ``` ### What did you expect to see? any ideas? ### What did you see instead? I try cancel call close_cb_ at connection close, tested for a long time with no problems so initial confirmation should be the problem here ### Anything Else? _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
