After further deploying builds with the asserts in handleEvent we have seen 
several variants of the following stack.  In the cases I've dug into the 
continuation called from reply_to_cont is a HttpSM.  And indeed  it is unlocked.

Grabbing the lock at the top of probeEvent and rescheduling if either lock 
cannot be obtained should solve the problem.  To simply the lock scope problem, 
we get the action.mutex lock again if the continuation lock is not present.  
Should only be a compare and increment overhead.

```
 0  ] libc-2.17.so       __GI_raise                                             
      ( :undefined         ) 
[ 1  ] libc-2.17.so       __GI_abort                                            
       ( :undefined         ) 
[ 2  ] libtsutil.so.7     ink_abort                                             
       ( ink_error.cc:99    ) 
[ 3  ] libtsutil.so.7     _ink_assert                                           
       ( ink_assert.cc:37   ) 
[ 4  ] traffic_server     Continuation::handleEvent(int, void*)                 
       ( Continuation.cc:32 ) 
[ 5  ] traffic_server     reply_to_cont                                         
       ( HostDB.cc:491      ) 
[ 6  ] traffic_server     HostDBContinuation::probeEvent(int, Event*)           
       ( HostDB.cc:1736     ) 
[ 7  ] traffic_server     HostDBContinuation::dnsPendingEvent(int, Event*)      
       ( HostDB.cc:1193     ) 
[ 8  ] traffic_server     Continuation::handleEvent(int, void*)                 
       ( Continuation.cc:33 ) 
[ 9  ] traffic_server     HostDBContinuation::remove_trigger_pending_dns()      
       ( HostDB.cc:1793     ) 
[ 10 ] traffic_server     HostDBContinuation::dnsEvent(int, HostEnt*)           
       ( HostDB.cc:1487     ) 
[ 11 ] traffic_server     Continuation::handleEvent(int, void*)                 
       ( Continuation.cc:33 ) 
[ 12 ] traffic_server     DNSEntry::postEvent(int, Event*)                      
       ( DNS.cc:1282        ) 
[ 13 ] traffic_server     Continuation::handleEvent(int, void*)                 
       ( Continuation.cc:33 ) 
[ 14 ] traffic_server     EThread::process_event(Event*, int)                   
       ( UnixEThread.cc:132 ) 
[ 15 ] traffic_server     EThread::process_queue(Queue<Event, 
Event::Link_link>*, i... ( UnixEThread.cc:171 ) 
[ 16 ] traffic_server     EThread::execute_regular()                            
       ( UnixEThread.cc:231 ) 
[ 17 ] traffic_server     EThread::execute()                                    
       ( UnixEThread.cc:326 ) 
[ 18 ] traffic_server     spawn_thread_internal                                 
       ( Thread.cc:85       ) 
[ 19 ] libpthread-2.17.so start_thread                                          
       ( :undefined         )
```


[ Full content available at: https://github.com/apache/trafficserver/pull/4142 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to