On Wed, 18 Apr 2012, Ski Kacoroski wrote:

I am in the process of redoing my logging architecture to support both *nix and windows platforms. We currently have Splunk, but because of the per/GB pricing we have already decided that we cannot use it for all our logs (which kind of defeats the purpose of a central logging system). So I was looking at Greylog2 when my intern found NXlog. If any one has used it for either a complete system or just the as a system to forward windows logs to a unix style logging system I would appreciate your comments. If you have any other ideas for centralized logging infrastructures that support easy adhoc queries via a graphical interface, please let me know.

I have not used nxlog yet (I just learned about it recently), but it sounds like it's a strong contender.

I think that you will not go very wrong if you use nxlog, rsyslog, or syslog-ng as you should be able to replace any one of them with one of the others if you run into serious trouble.

a few years ago I evaluated syslog options and ended up going with rsyslog, and the rsyslog performance has improved by almost two orders of magnatude since then, so I'm confident that it can transport your logs fast enough.

However, all of these are just going to solve your log transport problem, not your complete logging solution.

What I ended up building uses syslog as the transport mechanism, but then it sends the logs to multiple destinations, one of which is Splunk for it's easy searching, and if you can filter out a lot of the noise, you may find that Splunk is still a valued piece of your logging infrastructure.

That being said, any of the modern syslog daemons can also write to many different types of destinations, including hadoop, Postgres, elasticsearch and others, so the ultimate destination of the logs is an independant decision from the log transport.


Going into a bit more detail of my logging infrastructure.

It was designed to handle at least 100K logs/sec of ~250 byte log messages. It has hit a peak of 92K logs in a second, so it seems to be holding, and all measurements show that it can able to handle peaks up to ~400K logs/sec, which is wire-speed for my gig-E network, I just don't know how sustained such a burst could be.

I have a first tier of log relay boxes, these boxes receive the logs from all my different networks and do whatever cleanup is needed (custom log formats to fix broken senders, running rsyslog cleanup parser modules to deal with things like Cisco routers adding an extra field if they log by name, etc).

The first tier then delivers the logs over a dedicated switch (Cisco 3550) via UDP to a multicast MAC address to multiple farms of servers that receive them (using the iptables CLUSTERIP feature). This allows one copy of the log to be send and received by multiple farms of machines. Each of these farms can have multiple boxes splitting the inbound traffic between them. This puts a single copy of the log message on the wire, no matter how many systems are recieving it.

the farms that I currently have are:

1. Archive (simple store to disk)

2. Reporting (actually the same box as #1, runs periodic scripts against the logs to create hourly and daily reports)

3. Alerting (Simple Event Correlator to generate alerts on specific messages or combinations of messages)

4. Searching (Splunk for easy ad-hoc searching of the logs)

In the past I've had a couple other alerting and reporting farms running proprietary tools, but right now they've been phased out.

This approach allows me to add an additional farm of receiving boxes without having to make any configuration changes on the relay boxes, and according to my testing, it can scale up to gig-E wire speed with no packet loss after sending several billion test log messages (at least as long as the disk I/O can keep up with rsyslog writing the data out to disk)

The drawback to this approach is that since it does use UDP, if a receiving farm dies entirely, the sending systems don't know it. I have the relay boxes write a copy of the logs locally as well as forwarding them, so I have always been able to recover with those logs (I really need to finish getting all of my farms HA :-)

If a log is extremely long I currently allow it to be truncated to one packet. I've thought about going to jumbo frames on the log delivery network so that I could handle log lines up to 9K cleanly, but it hasn't been enough of an issue for me to have done so yet. The other thing that I could do is to detect extra-long lines and switch from delivering via the efficient UDP multicast method to delivering multiple copies (one to each farm) via TCP, but I probably won't do that until after going to jumbo frames (if ever)

David Lang
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
http://lopsa.org/

Reply via email to