Re: Abstract namespace sockets handling

Willy Tarreau Wed, 02 Jul 2014 09:41:04 -0700

Hi Jan,

On Thu, Jun 26, 2014 at 11:45:13PM +0200, hodor wrote:
> Hello,
> 
> I think I may have found some problems regarding abstract namespace sockets in
> HAProxy (1.5.1).
> 
> 
> Test config:
> 
> ==============================
> defaults
>   mode tcp
> 
> listen test1
>   bind unix@/tmp/test1.sock
>   server test1 127.0.0.1:22
> 
> listen test2
>   bind abns@test2
>   server test2 unix@/tmp/test1.sock
> 
> listen test3
>   bind abns@test3
>   server test3 abns@test2
> ==============================
> 
> 
> And now the problems:
> 
> 1) 
> 
> When bind() fails (function uxst_bind_listener()), the fail path doesn't
> consider the abstract namespace and tries to unlink paths held in
> uninitiliazed memory (tempname and backname). See the strace excerpt;
> the strings still hold the path from test1.
> 
> ===============================================================================================
> 23722 bind(5, {sa_family=AF_FILE, path=@"test2"}, 110) = -1 EADDRINUSE 
> (Address already in use)
> 23722 unlink("/tmp/test1.sock.23722.tmp") = -1 ENOENT (No such file or 
> directory)
> 23722 close(5)                          = 0
> 23722 unlink("/tmp/test1.sock.23722.bak") = -1 ENOENT (No such file or 
> directory)
> ===============================================================================================
> 
> The first attached patch (haproxy_do_not_unlink_abns.diff) seems to fix it.


OK thank's, I've applied it, with a copy of your description above in the
commit message for further reference.

> BTW, if abns@ bind() fails during a "-sf" reload, then /tmp/test1.sock of the
> old HAProxy instance is unlinked/unreachable and thus cannot accept new
> connections. The new unix@ bind() succeeded, the ".tmp" got renamed, the 
> ".bak"
> was unlinked. So far so good. But when the abns@ bind() fails, then the whole
> new HAProxy instance fails and leaves the old instance "degraded".
> 
> I do not know how to simply fix this.

That's a good point. We have no link()/unlink() on abstract sockets, so
I don't see how we'll be able to temporarily rename a socket. Since this
is linux-only, we can consider any os-specific trick available and see if
one is satisfying (eg: shutdown(RD) followed by listen(), etc).

> 2)
> 
> I think the addrlen for bind() for abstract namespace sockets may be 
> incorrect,
> or perhaps unfriendly for other programs. At the very least socat (with
> ABSTRACT-CONNECT on one end) cannot connect to the "test2" socket.
> 
> When haproxy is on both ends of such a socket, everything is fine, because it
> uses the incorrect (?) addressing consistently :).

Funny. Better fix it before it becomes a new de-facto standard like the TCP
URG pointer which can be configured to work as per RFC or as per BSD :-)

Indeed, strace shows that haproxy and socat used different address lengths :

haproxy:    bind(5, {sa_family=AF_FILE, path=@"foo"}, 110) = 0
socat:   connect(3, {sa_family=AF_FILE, path=@"foo"}, 6) = -1 ECONNREFUSED 
(Connection refused)

Now, looking at the man page, I still think that it could be argued that socat
has it wrong, because I understand below that all the address structure is
used :

     *  abstract: an abstract socket address is distinguished  by  the  fact
        that  sun_path[0]  is  a  null byte ('\0').  The socket's address in
        this namespace is given by the additional bytes in sun_path that are
        covered  by  the  specified  length of the address structure.  (Null
        bytes in the name have no special significance.)

So what socat does prevents us from building binary addresses (eg: random
addresses for internal communications). But, in practice, we're making names
out of human-readable strings, we're not building addresses out of binary
data, so I'd be tempted to think that it would make more sense to adopt
socat's naming method instead of padding the remaining bytes with zeroes.

Still I feel concerned about future evolutions. I have a deep feeling that
this is something we could regret sooner or later. Indeed, if we use strlen()
over possibly binary contents to retrieve a socket address length, something
tells me that we're sacrifying the ability to later extend this scheme to
other usages.

> The second attached patch (haproxy_correct_abns_length.diff) seems to fix it.
> No idea whether this is the right way to do it.

I've just found in socat's man that it supports both modes :

       unix-tightsocklen=[0|1]
              On socket operations, pass a socket address length that does not
              include the whole  struct  sockaddr_un record  but  (besides
              other components)  only  the  relevant part of the filename or
              abstract string.
              Default is 1.

and indeed it works here :

  $ strace -s 200 -e trace=connect socat readline 
abstract-connect:foo,unix-tightsocklen=0
  connect(3, {sa_family=AF_FILE, path=@"foo"}, 110) = 0

So given that, I'd prefer to stay on the current scheme and document this
way to interface with socat instead. It seems safer and more durable to me.
What do you think ?

> 3)
> 
> HAProxy reload (the "-sf" stuff) fails when an abns@ socket is already bound.
> The shutdown() trick doesn't seem to work, nor does SO_REUSEADDR. I don't know
> whether it can be done without the old process closing the socket first. And
> even then, the ERR_FATAL upon bind() failure in uxst_bind_listener() would
> prevent the call of tell_old_pids().
> 
> I don't know how to fix this.

Indeed that's a problematic concern. For now I have no idea how to work
around it. Abstract sockets are very new to me, I haven't explored them
much yet as you can see :-/

> I can work around all of those problems by using plain old unix sockets
> instead, but I like the idea of abstract namespace sockets :).

I like it as well, but you see, another concern I have is the fact that
any other process -not just haproxy- could be bound to an abstract socket.
While there are well-known ports for TCP/UDP services, there's no official
registry of well-known abstract socket addresses and this is a problematic
point. I suspect that we'll quickly come up with a complement which is a
random address : we would generate a random address and try to bind. Then
we would map each abstract name to that random address. It would solve all
the communications issues within a group of processes belonging to the same
config space.

My gut feeling is that we should use file-system for anything system-wide
(ie: socat to haproxy), as it's the only way to safely connect two ends. But
we could use abstract sockets inside the same process or group of processes.

An intermediate mechanism could be to have haproxy automatically prefix
abstract sockets with the starting process' pid or with a short random.

So I have applied patch #1 and not #2. Instead I think we'd rather continue
to discuss the design choices and options here.

Best regards,
Willy

Re: Abstract namespace sockets handling

Reply via email to