Hi,

Our config is quite complex and I'm trying to narrow it down. It is
occurring only on one production haproxy cluster (which consists of 6
servers in each of two data centers) with significant load - crashes occurs
on random servers so I would exclude memory corruption.

I'm suspecting SPOE or/and LUA script both are used to send metadata about
each request to an external endpoint. Yesterday I disabled this feature in
one datacenter to verify.

Our build is done in docker (Ubuntu bionic) with kernel 4.9.184-linuxkit,
crash is on Ubuntu bionic 4.15.0-55-generic, using:
haproxy 2.0.17
openssl 1.1.1f
pcre 8.44
lua 5.3.5
lrandom (PRNG for lua, we're using it for 2 or 3 years without any
problems, and soon we will drop it from our build)

compiled in following way:

# LUA
wget http://www.lua.org/ftp/lua-$LUA_VERSION.tar.gz \
    && tar -zxf lua-$LUA_VERSION.tar.gz \
    && cd lua-$LUA_VERSION \
    && make linux test \
    && make install

# LUA LRANDOM
wget http://webserver2.tecgraf.puc-rio.br/~lhf/ftp/lua/ar/lrandom-100.tar.gz
&& tar -zxf lrandom-100.tar.gz \
    && make -C lrandom-100 \
    && make -C lrandom-100 install

# PCRE
wget https://ftp.pcre.org/pub/pcre/pcre-$PCRE_VERSION.tar.gz \
    && tar -zxf pcre-$PCRE_VERSION.tar.gz \
    && cd pcre-$PCRE_VERSION \
    && ./configure --prefix=/usr/lib/haproxy/pcre_$PCRE_VERSION
--enable-jit --enable-utf --enable-unicode-properties
--disable-silent-rules \
    && make \
    && make install

# OPENSSL
wget https://www.openssl.org/source/openssl-$SSL_VERSION.tar.gz \
    && tar -zxf openssl-$SSL_VERSION.tar.gz \
    && cd openssl-$SSL_VERSION \
    && ./Configure --openssldir=/usr/lib/haproxy/openssl_$SSL_VERSION
--prefix=/usr/lib/haproxy/openssl_$SSL_VERSION
-Wl,-rpath=/usr/lib/haproxy/openssl_$SSL_VERSION/lib shared no-idea
linux-x86_64 \
    && make depend \
    && make \
    && make install_sw

and finally haproxy is compiled using deb builder:

override_dh_auto_build:
        make TARGET=$(HAP_TARGET) DEFINE="-DIP_BIND_ADDRESS_NO_PORT=24
-DMAX_SESS_STKCTR=12" USE_PCRE=1 USE_PCRE_JIT=1
PCRE_INC=/usr/lib/haproxy/pcre_$(PCRE_VERSION)/include
PCRE_LIB="/usr/lib/haproxy/pcre_$(PCRE_VERSION)/lib
-Wl,-rpath,/usr/lib/haproxy/pcre_$(PCRE_VERSION)/lib" USE_GETADDRINFO=1
USE_OPENSSL=1 SSL_INC=/usr/lib/haproxy/openssl_$(SSL_VERSION)/include
SSL_LIB="/usr/lib/haproxy/openssl_$(SSL_VERSION)/lib
-Wl,-rpath,/usr/lib/haproxy/openssl_$(SSL_VERSION)/lib" ADDLIB=-ldl
USE_ZLIB=1 USE_DL=1 USE_LUA=1 USE_REGPARM=1

DIP_BIND_ADDRESS_NO_PORT is now absolete and we'll drop it
MAX_SESS_STKCTR=12 we need more stick tables

Kind regards,


czw., 17 wrz 2020 o 08:18 Willy Tarreau <[email protected]> napisaƂ(a):

> Hi guys,
>
> On Thu, Sep 17, 2020 at 11:05:31AM +1000, Igor Cicimov wrote:
> (...)
> > > Coredump fragment from thread1:
> > > (gdb) bt
> > > #0  0x000055cbbf6ed64b in h2s_notify_recv (h2s=0x7f65b8b55130) at
> > > src/mux_h2.c:783
>
> So the code is this one:
>
>    777  static void __maybe_unused h2s_notify_recv(struct h2s *h2s)
>    778  {
>    779          struct wait_event *sw;
>    780
>    781          if (h2s->recv_wait) {
>    782                  sw = h2s->recv_wait;
>    783                  sw->events &= ~SUB_RETRY_RECV;
>    784                  tasklet_wakeup(sw->tasklet);
>    785                  h2s->recv_wait = NULL;
>    786          }
>    787  }
>
> In the trace it's said that sw = 0xffffffff. Looking at all places where
> h2s->recv_wait() is modified, it's either NULL or a valid pointer to some
> structure. We could have imagined that for whatever reason h2s is wrong
> here, but this call only happens when its state is still valid, and it
> experiences double dereferences before landing here, which tends to
> indicate that the h2s pointer is OK. Thus the only hypothesis I can have
> for now is memory corruption :-/ That field would get overwritten with
> (int)-1 for whatever reason, maybe a wrong cast somewhere, but it's not
> as if we had many of these.
>
> > I'm not one of the devs but obviously many of us using v2.0 will be
> > interested in the answer. Assuming you do not install from packages can
> you
> > please provide some more background on how you produce the binary, like
> if
> > you compile then what OS and kernel is this compiled on and what OS and
> > kernel this crashes on? Again if compiled any other custom compiled
> > packages in use, like OpenSSL, lua etc, you might be using or have
> compiled
> > haproxy against etc.?
> >
> > Also if this is a bug and you have hit some corner case with your config
> > (many are using 2.0 but we have not seen crashes) you should provide a
> > stripped down version (not too stripped though just the sensitive data)
> of
> > your config too.
>
> I agree with Igor here, any info to try to narrow down a reproducer, both
> in terms of config and operations, would be tremendously helpful!
>
> Thanks,
> Willy
>

Reply via email to