Re: MEDIUM: Adding upstream socks4 proxy support

Willy Tarreau Sun, 05 May 2019 23:33:20 -0700

Hi Alec,

On Tue, Apr 30, 2019 at 07:59:16PM +0800, Alec Liu wrote:
> > I'm seeing that you copied a doc retrieved from somewhere else that is
> > found at various places on the net. Have you checked the license for
> > this doc to be sure we can copy and distribute it like this ? It might
> > be perfectly fine but we need to be sure. In any case we'll have to
> > mention the license if it differs from the other ones.
> 
> I have the "SOCKS: A protocol for TCP proxy across firewalls" doc
> found at openssh's website
> under OpenSSH Sprcifications along with those RFCs.
> "https://www.openssh.com/specs.html";,
> "https://www.openssh.com/txt/socks4.protocol";
> And OpenSSH is using GPL, so I believe it should be fine.


OK. Please at least mention it in the commit message so that if anybody
complains, there is no doubt about the origin and we can even take
appropriate actions such as removing and replacing it with a link or
anything else. Similarly if this doc ever were updated, someone could
propose us some updates.

> > >  struct check_status {
> > > diff --git a/include/types/connection.h b/include/types/connection.h
> > > index 308be7d7..2a1b23b4 100644
> > > --- a/include/types/connection.h
> > > +++ b/include/types/connection.h
> > > @@ -47,6 +47,15 @@ struct server;
> > >  struct session;
> > >  struct pipe;
> > >
> > > +/* socks4 upstream proxy definitions */
> > > +struct socks4_request {
> > > +     uint8_t version;        /* SOCKS version number, 1 byte, must be 
> > > 0x04 for this version */
> > > +     uint8_t command;        /* 0x01 = establish a TCP/IP stream 
> > > connection */
> > > +     uint16_t port;          /* port number, 2 bytes (in network byte 
> > > order) */
> > > +     uint32_t ip;            /* IP address, 4 bytes (in network byte 
> > > order) */
> > > +     char user_id[8];        /* the user ID string, variable length, 
> > > terminated with a null (0x00); Using "HAProxy\0" */
> > > +};
> > > +
> > >  /* Note: subscribing to these events is only valid after the caller has 
> > > really
> > >   * attempted to perform the operation, and failed to proceed or complete.
> > >   */
> > > @@ -156,7 +165,7 @@ enum {
> > >
> > >       CO_FL_EARLY_SSL_HS  = 0x00004000,  /* We have early data pending, 
> > > don't start SSL handshake yet */
> > >       CO_FL_EARLY_DATA    = 0x00008000,  /* At least some of the data are 
> > > early data */
> > > -     /* unused : 0x00010000 */
> > > +     CO_FL_SOCKS_HS      = 0x00010000,  /* handshaking with upstream 
> > > SOCKS proxy */
> > >       /* unused : 0x00020000 */
> >
> > From what I'm seeing later in the code, it's not exactly a handshake,
> > you have to send a header and to wait for the response header to arrive.
> > I'd rather have one flag per direction which save from having to deal
> > with state management.
> 
> Actually, I have the same idea as you, using two flags only. But ends up with
> this design later on. The reason I am doing it in this way, that's because:
> 1) I might want to add the socks5 protocol support onto it later on, which
> will become a little more complex, not just one request and response.
> We will get something like authentication method handshake, login,
> connection request, etc.

Then this is exactly what muxes are made for. They live their own life
and encapsulate application data in a way that is transparent for the
application layer. Ultimately we should be able to stack multiple such
transformations much easier than what it is today. The SSL layer is
currently being modified for this for example. I'd suggest that you
stick to two individual headers for SOCKS4 and that we see later how
SOCKS5 has to be done (if ever done by the way).

> 2) Looks like do not have much unused bits leftover within CO_FL_*.

That's not *your* problem but the one for the person having something
else to implement after you. Bits are like desserts in restaurants :
first come, first served.

> > >  /* possible connection error codes */
> > >  enum {
> > > @@ -451,6 +469,13 @@ struct connection {
> > >       struct {
> > >               struct sockaddr_storage from;   /* client address, or 
> > > address to spoof when connecting to the server */
> > >               struct sockaddr_storage to;     /* address reached by the 
> > > client, or address to connect to */
> > > +             struct {
> > > +                     unsigned char use;                      /* socks4 
> > > proxy enabled  */
> > > +                     struct sockaddr_storage addr;           /* the 
> > > address of the socks4 proxy */
> > > +                     struct socks4_request req_line;         /* the info 
> > > send to socks4 proxy */
> > > +                     unsigned int status;                    /* 
> > > CO_SOCKS_* */
> > > +                     signed short req_remain_len;            /* the 
> > > length remain to sent */
> > > +             } socks_proxy;
> >
> > It is extremely huge for the struct connection unfortunately. We're
> > currently trying to shrink it, ideally extracting the address field
> > which is large, and this doubles it. With the two handshake flags
> > you can cover the status. The "use" has to be found and enabled
> > based on the server, and the address is fixed so it can also be passed
> > from the server. The request line could be rebuilt on each attempt as
> > we do with the proxy protocol header. Maybe the only part remaining
> > would be req_remain_len, which could probably be merged with the
> > proxy protocol offset later since we don't need to maintain both at
> > once. If you can use the PP offset now it's even better ;-)
> 
> What if we just have a reference here, and using malloc to alloc some memory
> for those connections which are going to use socks proxy only. It will be
> NULL for those connections which won't need it. When the socks proxy handshake
> finished, we can have this part of memory freed, before turn it over
> for data usage.

It generally is a very bad idea to rely on memory allocation at lower
layers because any memory allocation may fail, you have to deal with
this, and ideally you want to be able to nicely recover from this,
which is quite hard at low levels. For example, on the data path, if
a memory allocation fails, your I/O is temporarily paused until some
memory becomes available again. This is invisible for the user (and is
sometimes even faster than using more memory but that's another story).
Here if you'd want to be *that* transparent it would be terribly
difficult. So you'd actually have to implement code to deal with
allocation errors *and* make the connection cleanly fail. With the
proper design, not only you don't have to deal with this, but you have
a guaranteed 100% success rate.

As explained somewhere else in this thread, the connection already holds
the pointer to the server (conn->target) so most of the fields are known
and do not need to be duplicated/allocated.

> I will try to look into it, if we can reuse the "proxy protocol
> offset", if we want to keep the ability to
> allow using PROXY Protocol and socks proxy at the same time.

Even if we use both at the same time, it should not be a problem because
you don't send the two in parallel, you send each in turn. I suspect you
might only need to check for the other bit when clearing your bit, to
reset the offset.

> > Here you'd start by rebuilding the request line into a temporary local
> > buffer based on the same elements you used for the checks and for the
> > server's connect. You know the value is stable over time so you can
> > redo it and it will give you the same result. You just have to restart
> > sending from the send offset.
> 
> Got it looks like you preference memory over CPU time, :-).

That's normal. Over the last 20 years, processors have become 1 million
times more powerful, but memory has only increased by 1 thousand. Guess
which one will be really scarce in then next 10 years ?

> > > +     while (conn->addr.socks_proxy.req_remain_len) {
> > > +             int ret = 0;
> > > +
> > > +             conn->addr.socks_proxy.status = CO_SOCKS_SENDING;
> > > +
> > > +             /* we are sending the socks4_req_line here. If the data 
> > > layer
> > > +              * has a pending write, we'll also set MSG_MORE.
> > > +              */
> > > +             ret = conn_sock_send(conn,
> > > +                                  ((char 
> > > *)(&(conn->addr.socks_proxy.req_line))) +
> > > +                                  
> > > (sizeof(conn->addr.socks_proxy.req_line)-(conn->addr.socks_proxy.req_remain_len)),
> > > +                                  conn->addr.socks_proxy.req_remain_len,
> > > +                                  (conn->flags & CO_FL_XPRT_WR_ENA) ? 
> > > MSG_MORE : 0);
> > > +
> > > +             ha_alert("SOCKS PROXY HS FD[%04X]: Before send remain is 
> > > [%d], sent [%d]\n", conn->handle.fd, 
> > > conn->addr.socks_proxy.req_remain_len, ret);
> >
> > By the way, I'm assuming that for now this is for debugging, but I just
> > wanted you to be aware that ha_alert() and ha_warning() are only meant
> > to be used during config parsing and not run time as they will most
> > often not be visible due to stdout/stderr being closed and/or redirected
> > to something else. If you need to emit some errors at run time, please
> > raise CO_FL_ERROR and set an error code among CO_ER_* into
> > conn->err_code. You may add other codes but then please add the human
> > readable version into proto/connection.h (grep for CO_ER_SSL_RENEG for
> > example).
> 
> Got it.
> What if I want to have some trace debug info, which way will you recommend.

There are some debug macros such as DPRINTF which are enabled only when
built with debugging options. Be careful to also build without the debug
options when submitting your code, to make sure there are no leftovers of
variables used only for debugging that cause some build warnings.

> Thank you once again for the reviewing, and all the detail comment,
> which is very helpful for me to get better understand of the project.

You're welcome! I just hope it's time properly invested for the long term ;-)

Cheers,
Willy

Re: MEDIUM: Adding upstream socks4 proxy support

Reply via email to