Re: [PATCH] MEDIUM: stats: Add `show events` command.

Willy Tarreau Sat, 24 Mar 2012 13:44:58 -0700

Hi Aman,

On Fri, Mar 23, 2012 at 01:33:30PM -0700, Aman Gupta wrote:
> Tracing TCP connections that make hops through haproxy is currently very
> challenging. To get a list of proxied connection pairs inside haproxy,
> one must use `show sess` to dump the session table, grab the fd pairs
> for each session, resolve those to inodes via the process's file
> descriptor table, and then resolve those inodes to addresses via the TCP
> connection table. This is quite cumbersome and slow (especially when the
> TCP connection table is huge), and does not scale when there are
> hundreds or thousands of connections happening per second.


Agreed. For this reason we've recently extended the log format to optionally
include new fields such as the outgoing source IP:port. I would not have
expected another request to come that soon :-)

> This patch adds a new `show events` command to the stats socket, which
> streams events when sessions are established or destroyed. This allows
> any interested party to subscribe to these events and maintain their
> own session table about the state inside haproxy. This data can then be
> used to augment data collected from other sources (like pcap), to follow
> a connection through all the hops it makes.

This is really nice that you did that, as it's been on my todo list for
a long time. But I have comments on this before I can merge it :

1) I'd rather use "debug sessions" than "show events" because till now
   "show anything" only lists a finite list, and "events" can be a lot
   of things.

2) it looks to me like the stats socket is still usable in parallel to the
   command. If so, we could as well as an "undebug all" command to stop
   the debugging. But I'm not requesting that for now, it's more out of
   curiosity.

3) I'd very slightly change the output format to include delimiters. Maybe
   something like this :

    + 1 127.0.0.1:50869 - 127.0.0.1:9418 | 127.0.0.1:50870 - 127.0.0.1:6000

   instead of :

    + 1 127.0.0.1:50869 127.0.0.1:9418 127.0.0.1:50870 127.0.0.1:6000

4) Don't you think it would be useful to see a bit more states ? I would
   personally like to see 3 steps instead of 2 per session :
     - incoming session accepted
     - forward to server
     - closed session to client

   We could then have a single letter char instead of +/- which would report
   the state (A/F/C) :

    A 1 127.0.0.1:50869 - 127.0.0.1:9418
    F 1 127.0.0.1:50869 - 127.0.0.1:9418 | 127.0.0.1:50870 - 127.0.0.1:6000
    C 1 127.0.0.1:50869 - 127.0.0.1:9418

> diff --git a/src/dumpstats.c b/src/dumpstats.c
> index f680134..959b114 100644
> --- a/src/dumpstats.c
> +++ b/src/dumpstats.c
(...)
> +void stats_event_new_session(struct session *s)
> +{
> +     struct session *curr;
> +     struct stream_interface *si;
> +     int ret;
> +
> +     char addrs[4][INET6_ADDRSTRLEN + strlen(":65535") + 1] = {"","","",""};
> +     int i, fd, port;
> +     const void *addr;
> +     struct sockaddr_storage sock;
> +     socklen_t addr_size;
> +
> +     for(i = 0; i < 4; i++) {
> +             fd = s->si[ i/2 ].fd;
> +
> +             addr_size = sizeof(sock);
> +             if (!(i%2==0 ? getpeername : getsockname)(fd, (struct sockaddr 
> *)&sock, &addr_size)) {

Here it's a waste of CPU cycles to call getpeername/getsockname 4 times
for addresses we generally already have. You have everything in the
stream_interface's addr (.from, .to). The local address is not always
filled, you have to check the session's flags for SN_FRT_ADDR_SET and
SN_BCK_ADDR_SET and call the associated functions accordingly prior
to use the addresses. The advantage is that the getpeername/getsockname
are already performed once in all of a session's life.

> +                     switch (sock.ss_family) {
> +                     case AF_INET:
> +                             addr = (const void *)&((struct sockaddr_in 
> *)&sock)->sin_addr;
> +                             port = ntohs(((struct sockaddr_in 
> *)&sock)->sin_port);
> +                             inet_ntop(sock.ss_family, addr, addrs[i], 
> sizeof(addrs[i]));
> +                             snprintf(addrs[i], sizeof(addrs[i]), "%s:%d", 
> addrs[i], port);
> +                             break;
> +
> +                     case AF_INET6:
> +                             addr = (const void *)&((struct sockaddr_in6 
> *)&sock)->sin6_addr;
> +                             port = ntohs(((struct sockaddr_in6 
> *)&sock)->sin6_port);
> +                             inet_ntop(sock.ss_family, addr, addrs[i], 
> sizeof(addrs[i]));
> +                             snprintf(addrs[i], sizeof(addrs[i]), "%s:%d", 
> addrs[i], port);
> +                             break;
> +
> +                     case AF_UNIX:
> +                     default:
> +                             sprintf(addrs[i], "%s", "unknown");

In case of UNIX sockets, check how it's done in the logs, in short we log
"unix:$listener_id" on the frontend and nothing right now on the backend,
but if one day we support it it will probably be the destination socket
path.

> +                     }
> +             }
> +     }
> +
> +     if (LIST_ISEMPTY(&stats_event_listeners))
> +             return;

Hmmm.. this should have been put at the top of the function in order
to save this expensive work when nobody is listening.

(...)
> @@ -866,12 +974,17 @@ void stats_io_handler(struct stream_interface *si)
>       si->ib->rex = TICK_ETERNITY;
>       si->ob->wex = TICK_ETERNITY;
>  
> +     /* no timeouts when streaming events */
> +     if (si->st0 == STAT_CLI_EVENTS)
> +             s->req->rex = TICK_ETERNITY;
> +

This is not appropriate in my opinion. 1) it's a specific use, and 2)
it's already possible to set the timeout on the stats command line,
so better increase the CLI timeout before starting the dump.

>   out:
>       DPRINTF(stderr, "%s@%d: st=%d, rqf=%x, rpf=%x, rql=%d, rqs=%d, rl=%d, 
> rs=%d\n",
>               __FUNCTION__, __LINE__,
>               si->state, req->flags, res->flags, req->l, req->send_max, 
> res->l, res->send_max);
>  
>       if (unlikely(si->state == SI_ST_DIS || si->state == SI_ST_CLO)) {
> +             stats_event_listener_remove(s);
>               /* check that we have released everything then unregister */
>               stream_int_unregister_handler(si);

Normally you don't need this, you just need to add a .release callback
in cli_applet (dumpstats.c) which does the work.

> diff --git a/src/session.c b/src/session.c
> index 6906656..ae100db 100644
> --- a/src/session.c
> +++ b/src/session.c
> @@ -350,6 +350,8 @@ void sess_establish(struct session *s, struct 
> stream_interface *si)
>       rep->analysers |= s->fe->fe_rsp_ana | s->be->be_rsp_ana;
>       rep->flags |= BF_READ_ATTACHED; /* producer is now attached */
>       req->wex = TICK_ETERNITY;
> +
> +     stats_event_new_session(s);

This call is expensive on every session establishment. I'd really like
to see it guarded by "if (!LIST_ISEMPTY(&stats_event_listeners))".
Alternatively you can rename your existing function __stats_event_new_session()
and have stats_event_new_session() be inline and add perform the test if you
prefer, I don't care. What I don't want is to call a function most of the
time for nothing.

>  }
>  
>  /* Update stream interface status for input states SI_ST_ASS, SI_ST_QUE, 
> SI_ST_TAR.
> @@ -1655,6 +1657,8 @@ resync_stream_interface:
>               s->do_log(s);
>       }
>  
> +     stats_event_end_session(s);
> +

Same here, obviously.

Thanks for doing this, it looks really appealing ! I'm impatient to
see what realtime info everyone will think about ! I suspect that
filtering on a frontend will be among the first requests in the
wish list, quickly followed by source-mask filtering !

Willy

Re: [PATCH] MEDIUM: stats: Add `show events` command.

Reply via email to