joeylichang opened a new issue, #600:
URL: https://github.com/apache/incubator-kvrocks/issues/600

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-kvrocks/issues) and found no 
similar issues.
   
   
   ### Version
   
   kvrocks 2.0
   macos 11.3.1 (20E241)
   Darwin 20.4.0
   
   ### Minimal reproduce step
   
   1. write tool auto migrate data by slot
   2. migrating data and check migrating status by cluster info command
   3. the importing node run a while, then received signal 11
   ```
   W0521 09:00:24.454731 134987776 main.cc:85] ======= Ooops! kvrocks 2.0.6 got 
signal: 11 =======
   W0521 09:00:24.475903 134987776 main.cc:93] 1   kvrocks_s1                   
       0x000000010a70930a _ZNKSt3__110__function12__value_funcIFviEEclEOi + 74
   W0521 09:00:24.509127 134987776 main.cc:93] 2   ???                          
       0x0000000000000000 0x0 + 0
   W0521 09:00:24.528713 134987776 main.cc:93] 3   kvrocks_s1                   
       0x000000010a700ebf _ZNKSt3__18functionIFviEEclEi + 47
   W0521 09:00:24.547070 134987776 main.cc:93] 4   kvrocks_s1                   
       0x000000010a700e18 _ZN5Redis10Connection5CloseEv + 360
   W0521 09:00:24.548506 134987776 main.cc:93] 5   kvrocks_s1                   
       0x000000010a702ce7 _ZN5Redis10Connection7OnEventEP11buffereventsPv + 871
   W0521 09:00:24.549592 134987776 main.cc:93] 6   kvrocks_s1                   
       0x000000010ad4fa94 bufferevent_run_deferred_callbacks_unlocked + 468
   W0521 09:00:24.550586 134987776 main.cc:93] 7   kvrocks_s1                   
       0x000000010ad5b6c5 event_process_active_single_queue + 1093
   W0521 09:00:24.551573 134987776 main.cc:93] 8   kvrocks_s1                   
       0x000000010ad581ff event_base_loop + 1775
   W0521 09:00:24.552402 134987776 main.cc:93] 9   kvrocks_s1                   
       0x000000010a710da5 _ZN6Worker3RunENSt3__111__thread_idE + 37
   W0521 09:00:24.553221 134987776 main.cc:93] 10  kvrocks_s1                   
       0x000000010a71ba65 _ZZN12WorkerThread5StartEvENK3$_0clEv + 69
   W0521 09:00:24.553964 134987776 main.cc:93] 11  kvrocks_s1                   
       0x000000010a71b9cd 
_ZNSt3__1L8__invokeIZN12WorkerThread5StartEvE3$_0JEEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOS3_DpOS4_
 + 29
   W0521 09:00:24.554687 134987776 main.cc:93] 12  kvrocks_s1                   
       0x000000010a71b965 
_ZNSt3__1L16__thread_executeINS_10unique_ptrINS_15__thread_structENS_14default_deleteIS2_EEEEZN12WorkerThread5StartEvE3$_0JEJEEEvRNS_5tupleIJT_T0_DpT1_EEENS_15__tuple_indicesIJXspT2_EEEE
 + 37
   W0521 09:00:24.555402 134987776 main.cc:93] 13  kvrocks_s1                   
       0x000000010a71b202 
_ZNSt3__1L14__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEEZN12WorkerThread5StartEvE3$_0EEEEEPvSA_
 + 98
   W0521 09:00:24.556113 134987776 main.cc:93] 14  libsystem_pthread.dylib      
       0x00007fff205f8954 _pthread_start + 224
   W0521 09:00:24.556841 134987776 main.cc:93] 15  libsystem_pthread.dylib      
       0x00007fff205f44a7 thread_start + 15
   ```
   
   debug trace call path
   1. cluster import command build connectin bind close_cb_ whith 
StopForLinkError
   2. when connectin fd receive 'BEV_EVENT_EOF' event, will call close_cb_
   3. then received signal 11
   
   ```
   Status Cluster::ImportSlot(Redis::Connection *conn, int slot, int state) {
   ....
     switch (state) {
       case kImportStart:
         if (!svr_->slot_import_->Start(conn->GetFD(), slot)) {
           return Status(Status::NotOK, "Can't start importing slot " + 
std::to_string(slot));
         }
         // Set link importing
         conn->SetImporting();
         myself_->importing_slot_ = slot;
         // Set link error callback
         conn->close_cb_ = std::bind(&SlotImport::StopForLinkError, 
svr_->slot_import_, conn->GetFD());
     }
    .....
   }
   
   // libevent return 'BEV_EVENT_EOF' event , call close
   void Connection::Close() {
     owner_->FreeConnection(this);
     if (close_cb_) {
       close_cb_(GetFD());
     }
   }
   
   // do not enter close_cb_, but at close_cb_ line receive signal 11
   close_cb_ bind StopForLinkError
   void SlotImport::StopForLinkError(int fd) {
     std::lock_guard<std::mutex> guard(mutex_);
     if (import_status_ != kImportStart) return;
     if (!svr_->IsSlave()) {
       // Clean imported slot data
       LOG(ERROR) << "SlotImport 4444444444";
       auto s = ClearKeysOfSlot(namespace_, import_slot_);
       if (!s.ok()) {
         LOG(ERROR) << "[import] Failed to clear keys of slot " << import_slot_
                   << " Current status is link error"
                   << ", Err: " << s.ToString();
       }
     }
   
     LOG(ERROR) << "[import] Stop importing for link error, slot: " << 
import_slot_;
     import_status_ = kImportFailed;
     import_fd_ = -1;
   }
   
   ```
   
   
   ### What did you expect to see?
   
   any ideas?
   
   ### What did you see instead?
   
   I try cancel call close_cb_ at connection close, tested for a long time with 
no problems
   so initial confirmation should be the problem here
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to