[
https://issues.apache.org/jira/browse/PROTON-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080922#comment-18080922
]
ASF GitHub Bot commented on PROTON-2931:
----------------------------------------
cliffjansen commented on code in PR #444:
URL: https://github.com/apache/qpid-proton/pull/444#discussion_r3242507148
##########
c/src/proactor/epoll.c:
##########
@@ -1472,10 +1478,28 @@ static void connection_done_cb(void *user_data, struct
addrinfo *ai, int gai_err
// Return true if the socket is connecting and there are no Proton events to
deliver.
static bool pconnection_first_connect_lh(pconnection_t *pc) {
pn_proactor_t *p = pc->task.proactor;
+ pn_transport_t *tp = pc->driver.transport;
+ pc->name_lookup_pending = true;
+
unlock(&pc->task.mutex);
bool rc = pni_name_lookup_start(&p->name_lookup, pc->host, pc->port, pc,
connection_done_cb);
lock(&pc->task.mutex);
- return rc;
+
+ if (!rc) {
+ // Either the callback was synchronous or no callback was possible
+ if (pc->name_lookup_pending) {
+ // Clean up since there will be no callback.
+ pc->name_lookup_pending = false;
+ psocket_error(&pc->psocket, EAI_FAIL, "internal error on connect");
+ }
+ return false;
+ }
+ if (!pc->name_lookup_pending) {
+ // connection_done_cb already completed
+ if (pn_condition_is_set(pn_transport_condition(tp)))
+ return false;
+ }
+ return !pc->queued_disconnect && !pni_task_wake_pending(&pc->task);
Review Comment:
agreed. reverted
> Epoll proactor has race conditions with the async c-ares name resolver library
> ------------------------------------------------------------------------------
>
> Key: PROTON-2931
> URL: https://issues.apache.org/jira/browse/PROTON-2931
> Project: Qpid Proton
> Issue Type: Bug
> Components: proton-c
> Affects Versions: proton-c-0.41.0
> Reporter: Clifford Jansen
> Assignee: Clifford Jansen
> Priority: Blocker
>
> If the c-ares callback is very quick, the pn_raw_connection_t can sometimes
> fail to schedule itself and hang while still in the connecting phase. This
> can be easily reproduced with a ulimit for open files of 1024 or less and the
> following reproducer.
> https://github.com/fgiorgetti/router-locust
> Conversely, if the callback is extremely slow, the connection can wind up and
> free resources before the callback tries to reference through an invalid
> pointer. The connection should remember if a callback is pending and defer
> any cleanup until this concludes. This applies to raw and AMQP connections.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]