Hi,

I wanted to add a feature to gmetad so that it was possible to request multiple
items per connection on the interactive port, and while doing so I
uncovered a buffer overflow that
crashes the gmetad server. I'm releasing both patches together because they
are somewhat connected. Apologies if this is a problem and for the length of my
email, the intent is to make it easier for reviewers but it might trigger
brain filters.

== Multiple requests per connection on interactive port
We wanted to leverage ganglia to develop complex checks that rely on
correlating various metrics over several hosts. With a large tree the best way
to achieve that is by using the interactive port to request those specific
metrics, but this incurred in a large number of connections given the limit of
one item per request. The proposed patch removes this limit.

I've picked up ':' as a separator since it's the natural one under Unix and
wasn't included in the string process_request() checks against to
sanitize input.

To limit the changeset I've reused the existing code and simply wrapped
process_path in a do while loop that splits the request in chunks and pass them
to the function. As a side effect this happens:
if you request something like
/grid1/host1/load_one:/grid1/host1/load_five you'll
get two times the grid and host xml tags rather than one grid/host tag with
the two metrics merged inside. While at a first glance this looks ugly there
are imho a bunch of things to consider:

1) doing the merge server side is a lot of work and a big changeset afaics.
2) I would argue that it's actually correct to not merge because you've
requested for /grid/host/metric so you effectively get what you asked for.
3) Most likely on the client the processing will be done with a XML library
resulting in something similar to res[grid][host] = val , which
would make the merge fairly easy (at least in higher level languages like
python).

As to filters support, it is potentially a problem for multiple items per
request, but as it is I don't see any. Right now there's only support for a
SUMMARY filter which will only be applied to root/grid/cluster nodes and output
"<HOSTS UP=\"%u\" DOWN=\"%u\" SOURCE=\"gmetad\"/>". Being that the case with a
request like "/grid1:/grid2?filter=summary" each grid will get its own summary
which is the expected behavior. Nonetheless I wanted to point it out because in
case more filters are added this would need to be accounted for.

== Gmetad server buffer overflow and network overload
These problems exist regardless of the above feature but would get far worse in
that case, hence being linked.

=== Buffer overflow
It is possible to instantly crash gmetad by crafting a special request to be
sent to the interactive port.

In process_path() a char element[256] is allocated to contain the pieces of the
path as it is processed. If a request is made with a path element longer than
that the strncpy call will write to invalid memory location, since there is no
length checking performed on the input data to make sure it is less than the
size of element.

Fix is provided by malloc'ing the right size to accommodate any element's
length. REQUESTLEN provides protection against abusing malloc and launching a
memory exhaustion attack.

PoC: echo "/`python -c \"print \\"%s/%s\\" % ('a'*300,'b'*300)\"`" |
nc localhost 8652

=== DoS attacks
1) Given REQUESTLEN=2048, and 3 characters to be the minimum to craft a valid
and nonexistent path "/x", with the above feature implemented it would be
possible to trigger 2048/3 calls to process_path which would possibly lead to
CPU overload.
2) extension to 1) - as it is ganglia returns the entire tree if an element is
not found. with large trees 2048/3 requests could easily result in several GBs
of data being transferred. Related to this if you look at gmetad/server.c lines
601:606 you'll see this:
                     err_msg("Got a malformed path request from %s", remote_ip);
                     /* Send them the entire tree to discourage attacks. */
                     strcpy(request, "/");
which leads to the same scenario as above.

What I propose is that for both cases, malformed request and non existent
items, we log an error and bail out. This would solve 2) and most of 1) making
the call possibly exist far quicker.

== Patches summary
- server-c-bof-dos.diff , address buffer overflow and network DoS
- server-c-multi-item-request.diff , add the multi item per request feature

I have a version available that does not use strsep if you think that's
preferable, but it's a bit more code and strsep shouldn't be a dependency
problem even on not-so-new systems.

regards,

Spike

-- 
"Behind every great man there's a great backpack" - B.

Attachment: server-c-bof-dos.diff
Description: Binary data

Attachment: server-c-multi-item-request.diff
Description: Binary data

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to