On Wed, 18 Apr 2012, Ski Kacoroski wrote:
I am in the process of redoing my logging architecture to support both
*nix and windows platforms. We currently have Splunk, but because of
the per/GB pricing we have already decided that we cannot use it for all
our logs (which kind of defeats the purpose of a central logging
system). So I was looking at Greylog2 when my intern found NXlog. If
any one has used it for either a complete system or just the as a system
to forward windows logs to a unix style logging system I would
appreciate your comments. If you have any other ideas for centralized
logging infrastructures that support easy adhoc queries via a graphical
interface, please let me know.
I have not used nxlog yet (I just learned about it recently), but it
sounds like it's a strong contender.
I think that you will not go very wrong if you use nxlog, rsyslog, or
syslog-ng as you should be able to replace any one of them with one of the
others if you run into serious trouble.
a few years ago I evaluated syslog options and ended up going with
rsyslog, and the rsyslog performance has improved by almost two orders of
magnatude since then, so I'm confident that it can transport your logs
fast enough.
However, all of these are just going to solve your log transport problem,
not your complete logging solution.
What I ended up building uses syslog as the transport mechanism, but then
it sends the logs to multiple destinations, one of which is Splunk for
it's easy searching, and if you can filter out a lot of the noise, you may
find that Splunk is still a valued piece of your logging infrastructure.
That being said, any of the modern syslog daemons can also write to many
different types of destinations, including hadoop, Postgres, elasticsearch
and others, so the ultimate destination of the logs is an independant
decision from the log transport.
Going into a bit more detail of my logging infrastructure.
It was designed to handle at least 100K logs/sec of ~250 byte log
messages. It has hit a peak of 92K logs in a second, so it seems to be
holding, and all measurements show that it can able to handle peaks up to
~400K logs/sec, which is wire-speed for my gig-E network, I just don't
know how sustained such a burst could be.
I have a first tier of log relay boxes, these boxes receive the logs from
all my different networks and do whatever cleanup is needed (custom log
formats to fix broken senders, running rsyslog cleanup parser modules to
deal with things like Cisco routers adding an extra field if they log by
name, etc).
The first tier then delivers the logs over a dedicated switch (Cisco 3550)
via UDP to a multicast MAC address to multiple farms of servers that
receive them (using the iptables CLUSTERIP feature). This allows one copy
of the log to be send and received by multiple farms of machines. Each of
these farms can have multiple boxes splitting the inbound traffic between
them. This puts a single copy of the log message on the wire, no matter
how many systems are recieving it.
the farms that I currently have are:
1. Archive (simple store to disk)
2. Reporting (actually the same box as #1, runs periodic scripts against
the logs to create hourly and daily reports)
3. Alerting (Simple Event Correlator to generate alerts on specific
messages or combinations of messages)
4. Searching (Splunk for easy ad-hoc searching of the logs)
In the past I've had a couple other alerting and reporting farms running
proprietary tools, but right now they've been phased out.
This approach allows me to add an additional farm of receiving boxes
without having to make any configuration changes on the relay boxes, and
according to my testing, it can scale up to gig-E wire speed with no
packet loss after sending several billion test log messages (at least as
long as the disk I/O can keep up with rsyslog writing the data out to
disk)
The drawback to this approach is that since it does use UDP, if a
receiving farm dies entirely, the sending systems don't know it. I have
the relay boxes write a copy of the logs locally as well as forwarding
them, so I have always been able to recover with those logs (I really need
to finish getting all of my farms HA :-)
If a log is extremely long I currently allow it to be truncated to one
packet. I've thought about going to jumbo frames on the log delivery
network so that I could handle log lines up to 9K cleanly, but it hasn't
been enough of an issue for me to have done so yet. The other thing that I
could do is to detect extra-long lines and switch from delivering via the
efficient UDP multicast method to delivering multiple copies (one to each
farm) via TCP, but I probably won't do that until after going to jumbo
frames (if ever)
David Lang
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
http://lopsa.org/