I'm struggling a bit to avoid the "just throw logs in and figure out later 
what to do with them" inclination, and trying to plan how the different 
pieces might best be used.

I'd appreciate any comments as to whether this is a good approach.  I even 
have a picture.

My thinking goes like this: 

1) Bring data in and use extractors (mostly grok) to normalize to some set 
of standardized fields, somewhat based on what I can get free from Gelf.  I 
expect this kind of normalization will be a work in progress forever.  Grok 
especially but extractors in general seem easier to use than pipelines for 
normalization.

2) Let everything just stay in the default stream at that point, and feed 
into a set of pipeline rules.

3) Pipelines decide how to map the log messages from the physical origins 
into logical groupings, for example actual device (e.g. hardware or 
similar) events, infrastructure logins to network gear, VPN and similar 
access, web logs (probably different types)., etc.

3A) Garbage messages no one really cares about get dropped here.

3B) Some messages might end up in two places, e.g. we might have certain 
data access streams which are also web or FTP logs.

4) Streams control the alarms.

All wet, or going in the right direction? 






-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/50de6c0c-6380-4128-8835-7646dc710e06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to