I'm struggling a bit to avoid the "just throw logs in and figure out later what to do with them" inclination, and trying to plan how the different pieces might best be used.
I'd appreciate any comments as to whether this is a good approach. I even have a picture. My thinking goes like this: 1) Bring data in and use extractors (mostly grok) to normalize to some set of standardized fields, somewhat based on what I can get free from Gelf. I expect this kind of normalization will be a work in progress forever. Grok especially but extractors in general seem easier to use than pipelines for normalization. 2) Let everything just stay in the default stream at that point, and feed into a set of pipeline rules. 3) Pipelines decide how to map the log messages from the physical origins into logical groupings, for example actual device (e.g. hardware or similar) events, infrastructure logins to network gear, VPN and similar access, web logs (probably different types)., etc. 3A) Garbage messages no one really cares about get dropped here. 3B) Some messages might end up in two places, e.g. we might have certain data access streams which are also web or FTP logs. 4) Streams control the alarms. All wet, or going in the right direction? -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/50de6c0c-6380-4128-8835-7646dc710e06%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
