Am 19.07.2016 um 00:20 schrieb Christopher Schultz:
On 7/18/16 5:48 PM, Rainer Jung wrote:
Am 18.07.2016 um 17:02 schrieb Christopher Schultz:
All,
Michael Deiner found a buffer overflow in the call to FD_SET macro on
line 291 of jk_connect.c:
280> do {
281> rc = connect(sd, (const struct sockaddr *)&addr->sa.sin,
addr->salen);
282> } while (rc == -1 && errno == EINTR);
283>
284> if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY)
285> && (timeout > 0)) {
286> fd_set wfdset;
287> struct timeval tv;
288> socklen_t rclen = (socklen_t)sizeof(rc);
289>
290> FD_ZERO(&wfdset);
*291> FD_SET(sd, &wfdset);*
292> tv.tv_sec = timeout / 1000;
293> tv.tv_usec = (timeout % 1000) * 1000;
294> rc = select(sd + 1, NULL, &wfdset, NULL, &tv);
I'd like to fix this so it won't bring-down the server :)
But it quickly gets complicated.
The method itself takes "sd" (a jk_sock_t) as an argument to the
function, and we can check immediately whether it will cause FD_SET to
overflow -- easy: just check to see if the value is too large -- but
what should we do in that case?
This function should be connecting to a back-end Tomcat server, but if
we have too many outbound connections, we'll fail.
I'm not sure it makes any sense to let things get this far.
The proposed solution[1] is to use poll() instead of select(), but that
won't work on every platform, plus I'd like to be able to fix the buffer
overflow right away while we work-out a solution for poll() that will
work in most environments.
I think if the connection_pool_size exceeds FD_SETSIZE we should refuse
to start. Any other behavior will eventually cause errors.
+1 in principal. Unfortunately on Windows it seems the default for
FD_SETSIZE is only 64. That's probably too small but it seems it is
allowed on Windows to increase this limit during compilation:
<Quote>
The variable FD_SETSIZE determines the maximum number of descriptors in
a set. (The default value of FD_SETSIZE is 64, which can be modified by
defining FD_SETSIZE to another value before including Winsock2.h.)
Internally, socket handles in an fd_set structure are not represented as
bit flags as in Berkeley Unix. Their data representation is opaque.
</Quote>
That's ... weird. Okay.
So we should IMHO aim for
a) check connection pool size against FD_SETSIZE and fail during startup
if too big - or we decrease it to the max value and log a warning?
On *NIX, that value cannot reasonably be changed. I think we need to
make all our decisions at compile-time and fail-fast at runtime.
Yes, with "decrease it" I meant decreasing the configured pool size,
just as you assumed below.
Lowering to a reasonable maximum value is probably okay. I'm not sure
which would be worse: requiring the administrator to fix a configuration
problem before the server can even start (imagine a server that's been
working for years without this config, now it requires some change) or
auto-reconfiguring based upon a value the admin hasn't set.
Actually... in cases where this would have affected users, the result
would have been that everything is fine until there is a buffer
overflow. Hopefully, the buffer overflow is fatal, but it might not be.
So, lowering to a smaller value if connection_pool_size is too big
sounds good to me. Log with severity=WARN is a good option for notification.
b) define 1024 as the compile time FD_SETSIZE on Windows (same value as
the default e.g. on Linux and on 32 Bit Solaris). We already use 250 as
the default connection pool size.
+1
c) allow to increase FD_SETSIZE when building on Windows because it is
supported there.
+1
We probably want something like JK_FD_SETSIZE defaults to 1024 and then
FD_SETSIZE = JK_FD_SETSIZE in the build. I have absolutely no idea how
on earth to do that for our Windows builds.
d) use the existing macro HAVE_POLL to offer a poll based code path if
poll was detected.
I don't think HAVE_POLL is any kind of standard. I poked-around my
Debian Linux environment and HAVE_POLL was defined in a number of header
files, but it was unconditionally-defined to be "1" in files such as
postgresql/pg_config.h, so I think the package-maintainers must have
just said "this system has poll.h, let's just set HAVE_POLL=1 and call
it a day".
It is a define that our mod_jk build system sets when configure is used
and detects during the configure run, that poll() is available. This
define is already used in common/jk_connect.c at some places and could
be used in nb_connect() as well. It only makes sense for platforms where
configure is being run, ie. not on Windows and Netware, but nb_connect()
already has different implementation lines for the three platform types
(Windows, Netware, *Nix).
Using poll() if HAVE_POLL is defined (ie. poll() is available), gives us
a clean solution on most platforms except Windows, and if we allow to
increase the FD_SETSIZE on Windows (and choose a sane default ourselves)
people can work around the problem there as well.
I'll have a look at the Windows build files concerning setting
FD_SETSIZE to 1024 and probably allow to change the value during
compilation.
Concerning a) the code that reads the pool size is in
common/jk_ajp_common.c in function ajp_init():
p->ep_cache_sz = jk_get_worker_cache_size(props, p->name, cache);
The function gets a logger as an argument, so if we want we can easily
log stuff there. Correcting a cacxhe size that's too big to the max
value and log would be easiest. Terminating the startup is more
difficult. We do it e.g. for Apache using jk_error_exit() but it will be
a bit tedious to propagate the error from jk_ajp_common.c to mod_jk.c.
Furthermore you need a similar solution for ISAPI. Therefore I suggest
to choose the "correct the config and warn" attempt.
Thoughts?
I like correct-and-warn for a number of reasons.
So forgetting poll() for the time being, the correct-and-warn would just
be a check for connection_pool_size > FD_SETSIZE, then set
connection_pool_size = FD_SETSIZE and warn, right?
Correct, BUT: I'm not totally convinced, that a limit on the pool size
suffices. I think file descriptor numbers are global to each process. So
if you have various workers, say 20 workers each with a pool size of
100, then you could easily get file descriptor numbers from 0 to 1999.
Furthermore I expect other (non-mod_jk resp. non-ISAPI) file descriptors
in the process count as well:
- open log files
- incoming client to web server connections
- more?
If that is true, the attempt to work around the problem by limiting
pools wouldn't work. So Maybe we need to change the strategy to
- introduce poll() when available
- make FD_SETSIZE configurable (plus sane default) on Windows
- fail during connection attempt if file descriptor number for new
socket is to big
WDYT?
Seems simple enough :)
Hmmmm ...
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org