Hi Alexander,
On Fri, Nov 10, 2023 at 08:44:31PM +0000, Stephan, Alexander wrote:
> > I don't see how this is possible:
> >
> > list_for_each_entry(srv_tlv, &src->pp_tlvs, list) {
> > if (srv_tlv == NULL)
> > break;
>
> > For srv_tlv to be NULL, it would mean that you visited (NULL)->list at some
> > point, and apparently pp_tlvs is always manipulated with
> > LIST_APPEND() only, so I don't see how you could end up with something like
> > last->next = (NULL)->list. Are you sure it was not a side effect of a
> > debugging session with some temporary code in it ?
> > I'd be interested in knowing if you can reproduce it so that we can find
> > the root cause (and hopefully address it).
>
> I finally got back to this again. You should be able to reproduce it quite
> easily. After removing the 'break' above, you can run the regression tests.
(...)
> It should crash immediately with the following seg fault:
>
> Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault.
> srv_settings_cpy (srv=srv@entry=0xbed7e0, src=src@entry=0xbebce0,
> srv_tmpl=srv_tmpl@entry=0) at src/server.c:2526
> 2526 new_srv_tlv->fmt_string = strdup(srv_tlv->fmt_string);
>
> Overall, I am not too confident with the order of invocations in the server.
> Maybe we have some initializations mixed up and unintuitive some
> dependencies.
Thanks! I could indeed reproduce it and the cause is indeed a missing
initialization, because the very first element is NULL ;-) I found it,
there's still an ugly manual initialization of the default server made
in proxy_preset_defaults(). One day we'll have a function to reinitialize
a server... This patch fixes it, I'm going to merge it and get rid of
the test in the loop:
diff --git a/src/proxy.c b/src/proxy.c
index 7ff087190..544c22f82 100644
--- a/src/proxy.c
+++ b/src/proxy.c
@@ -1469,6 +1469,7 @@ void proxy_preset_defaults(struct proxy *defproxy)
defproxy->defsrv.onerror = DEF_HANA_ONERR;
defproxy->defsrv.consecutive_errors_limit = DEF_HANA_ERRLIMIT;
defproxy->defsrv.uweight = defproxy->defsrv.iweight = 1;
+ LIST_INIT(&defproxy->defsrv.pp_tlvs);
defproxy->email_alert.level = LOG_ALERT;
defproxy->load_server_state_from_file = PR_SRV_STATE_FILE_UNSPEC;
> > Another point, in make_proxy_line_v2() I'm seeing that you've placed
> > everything under "if (strm)". I understand why, it's because you didn't
> > want to call build_logline() without a stream. That made me think "hmmm
> > we already have the ability to do that", and I found it, you can use
> > sess_build_logline() instead, that takes the session separately, and
> > supports an optional stream. That would be useful for health checks for
> > example, since they don't have streams. But there's a catch: in function
> > make_proxy_line_v2() we don't have the session at the moment, and
> > build_logline() extracts it from the stream. Normally during a session
> > establishment, the session is present in connection->owner, thus
> > remote->owner in make_proxy_line_v2(). I suggest that you give this a
> > try to confirm that it works for you, this way we'd be sure that health
> > checks can also send the ppv2 fields. But that's not critical given that
> > there's no bad behavior right now, it's just that fields will be ignored,
> > so this can be done in a subsequent patch.
>
> I looked into it again. I did not see an obvious way to retrieve the current
> or pass the session as parameter.
>
> I mean, the easy option, as you said, would be to use remote->owner, which
> would require replacing if (stream) with if (remote) to prevent the NULL
> dereference.
> But we still want to send out TLVs when there is no remote.
Hmm you're right, I misanalyzed the situation there. Actually the
conn_send_proxy() function does have the current connection we're
working on, but the functions it calls (make_proxy_line()) doesn't
have it. However we pass either the remote connection when there is
a stream, or the local connection when there is no stream, hence
remote->owner should always be valid, and apparently it's done
precisely for checks:
/* The target server expects a PROXY line to be sent first.
* If the send_proxy_ofs is negative, it corresponds to the
* offset to start sending from then end of the proxy string
* (which is recomputed every time since it's constant). If
* it is positive, it means we have to send from the start.
* We can only send a "normal" PROXY line when the connection
* is attached to a stream connector. Otherwise we can only
* send a LOCAL line (eg: for use with health checks).
*/
if (sc && sc_strm(sc)) {
ret = make_proxy_line(trash.area, trash.size,
objt_server(conn->target),
sc_conn(sc_opposite(sc)),
__sc_strm(sc));
}
else {
/* The target server expects a LOCAL line to be sent first.
Retrieving
* local or remote addresses may fail until the connection is
established.
*/
if (!conn_get_src(conn) || !conn_get_dst(conn))
goto out_wait;
ret = make_proxy_line(trash.area, trash.size,
objt_server(conn->target), conn,
NULL);
}
Thus the only case where the remote conn does not exist is when being
established from an applet (e.g. Lua or httpclient).
> So maybe I could
> change it such that it uses remote->owner when available and otherwise, falls
> back to using the session from the stream?
> Maybe something along the lines of
> struct session *sess = (remote != NULL) ? remote->owner : strm->sess;
> replace->data = sess_build_logline(sess, strm, replace->area,
> replace->size, &srv_tlv->fmt);
> ?
> It seems to achieve the behavior you described regarding the health checks,
> right? If that's alright, I will send the corresponding patch.
I'd do it slightly differently but the principle remains the same: if
we have a stream we always have a session, otherwise the remote's owner
should be used:
struct session *sess = strm ? strm->sess : remote->owner;
replace->data = sess_build_logline(sess, strm, replace->area,
replace->size, &srv_tlv->fmt);
Please let me know if that works for you!
Thanks,
Willy