Thanks. So far I've largely ignored unix logs and even web logs, as the client I'm first considering this for is mostly windows and lots of network devices at lots of remote sites, so I've been concentrating there, but will experiment with filebeat when I get there.
The main place I was looking to do some normalization is the myriad of network devices (and about 12 years worth of firmware versions, slowly getting modernized). On Thursday, August 4, 2016 at 5:23:34 AM UTC-4, [email protected] wrote: > > > > On Thursday, 4 August 2016 05:41:59 UTC+8, Linwood Ferguson wrote: >> >> I'm struggling a bit to avoid the "just throw logs in and figure out >> later what to do with them" inclination, and trying to plan how the >> different pieces might best be used. >> >> I'd appreciate any comments as to whether this is a good approach. I >> even have a picture. >> >> My thinking goes like this: >> >> 1) Bring data in and use extractors (mostly grok) to normalize to some >> set of standardized fields, somewhat based on what I can get free from >> Gelf. I expect this kind of normalization will be a work in progress >> forever. Grok especially but extractors in general seem easier to use than >> pipelines for normalization. >> >> 2) Let everything just stay in the default stream at that point, and feed >> into a set of pipeline rules. >> >> 3) Pipelines decide how to map the log messages from the physical origins >> into logical groupings, for example actual device (e.g. hardware or >> similar) events, infrastructure logins to network gear, VPN and similar >> access, web logs (probably different types)., etc. >> >> 3A) Garbage messages no one really cares about get dropped here. >> >> 3B) Some messages might end up in two places, e.g. we might have certain >> data access streams which are also web or FTP logs. >> >> 4) Streams control the alarms. >> >> All wet, or going in the right direction? >> >> > > Hi Linwood, > > Thanks for sharing this, been working with the boss on a Graylog project > and we have discussed this a few times as there are various places where > you can filter, so what is the optimal approach? > > So far we have a good setup for Linux based server hosts (primarily only > interested in '/var/log/security' naughties and SELinux 'avc = denied' > which generally breaks something). I tried a number of 'shippers' but the > one we decided on was elastic 'filebeat'. > > It had the following advantages: > > - Written in Go so single binary (i.e. doesn't require a JVM) > - Easy config (yaml based do config management friendly) > - Allows for simple filtering before the data leaves the server (instead > of pumping all your logs and then filtering with Graylog so your only > searching against the data you care about) > - Graylog has an input Plugin that supports the beats shipper. > > For Linux based hosts IMO this is a lot more fun than syslog manipulation > (which you will most likely be forced to do for your other devices). So > outside of recommending the shipper (which was recommended to me) I think > that filtering at the source is an effective strategy. > > HTH Cheers Luke. > > -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/e6bfaba7-f129-482b-9382-e021df9193bc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
