[ 
https://issues.apache.org/jira/browse/PROTON-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080926#comment-18080926
 ] 

ASF GitHub Bot commented on PROTON-2931:
----------------------------------------

cliffjansen commented on code in PR #444:
URL: https://github.com/apache/qpid-proton/pull/444#discussion_r3242520692


##########
c/src/proactor/epoll_raw_connection.c:
##########
@@ -443,17 +468,30 @@ pn_event_batch_t *pni_raw_connection_process(task_t *t, 
uint32_t io_events, bool
     praw_initiate_cleanup(rc);
     return NULL;
   }
+  if (rc->task.closing) {
+    // rclosed and wclosed.  Allow final events to be processed.
+    unlock(&rc->task.mutex);
+    return &rc->batch;
+  }
   int events = io_events;
   int fd = rc->psocket.epoll_io.fd;
 
   if (rc->first_schedule) {
     rc->first_schedule = false;
     assert(!events); // No socket yet.
     assert(!rc->connected);
-    if (praw_connection_first_connect_lh(rc)) {
-      unlock(&rc->task.mutex);
-      return NULL;
+    bool wake_event = pni_task_wake_pending(&rc->task);
+
+    t->working = false;
+    rc->name_lookup_pending = true;
+    unlock(&rc->task.mutex);
+    praw_connection_first_connect(rc);
+    if (wake_event) {
+      lock(&rc->task.mutex);
+      t->working = true;
+      return &rc->batch;

Review Comment:
   triple doh!





> Epoll proactor has race conditions with the async c-ares name resolver library
> ------------------------------------------------------------------------------
>
>                 Key: PROTON-2931
>                 URL: https://issues.apache.org/jira/browse/PROTON-2931
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: proton-c-0.41.0
>            Reporter: Clifford Jansen
>            Assignee: Clifford Jansen
>            Priority: Blocker
>
> If the c-ares callback is very quick, the pn_raw_connection_t can sometimes 
> fail to schedule itself and hang while still in the connecting phase.  This 
> can be easily reproduced with a ulimit for open files of 1024 or less and the 
> following reproducer.
>   https://github.com/fgiorgetti/router-locust
> Conversely, if the callback is extremely slow, the connection can wind up and 
> free resources before the callback tries to reference through an invalid 
> pointer.  The connection should remember if a callback is pending and defer 
> any cleanup until this concludes.  This applies to raw and AMQP connections.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to