Just wanted to post a follow up to this... I've finally gotten my head around Grok patterns and how to use these with extractors and I have replaced all my extractors with only two to achieve the same set of extractions.
Load average on the Graylog servers is now in the 0.08 - 0.12 range while processing the same number of messages/second and extracting the exact same data. So once again thanks Kay! Cheers, Pete On Saturday, 6 June 2015 06:16:54 UTC+10, Pete GS wrote: > > Ah thanks Kay! > > I've never looked into Grok patterns, but that sounds like they could help > a great deal. > > As you've pointed out in my extractors, there's only a very small number > of specific log lines I need to identify and these contain all the fields I > wish to extract relating to the potential issues, so a Grok pattern sounds > like a perfect solution for that. > > I don't think I need any data type conversions but I'm planning on > upgrading the test lab to 1.1 next week anyway. > > Thanks for your help, I have some reading to do! > > Cheers, Pete > > On Friday, 5 June 2015 16:12:21 UTC+10, Kay Röpke wrote: >> >> Pete, >> >> The extractors themselves do not look too bad, but however whenever you >> use leading wildcards to extract similar data, the work that the extractors >> have to do is repeated, since they are executed one after the other. >> >> If there's no better way to extract that data, you might want to look >> into Grok patterns, as those will be executed "in parallel". >> For example, if you have multiple patterns that could potentially match, >> and then use | to combine those patterns, they get compiled down into a >> single regular expression. >> That should be faster, even though the overall expression is larger. >> >> The upside is that you can extract multiple named fields at once with >> Grok and can apply data type conversions in 1.1. >> >> You'll find examples in our documentation. Please note that the type >> conversions are a new feature in 1.1. >> >> Best, >> Kay >> >> On Fri, Jun 5, 2015, 2:45 AM Pete GS <[email protected]> wrote: >> >>> Hi all, >>> >>> I've finally discovered the source of my excess CPU load and high load >>> averages on my Graylog nodes! >>> >>> I've got a bunch of extractors that I use to pull information from my >>> vSphere platform's VMKernel logs. >>> >>> The catch with these is that a lot of items in the message string vary >>> quite a bit, so finding a regex to match is quite difficult... read pretty >>> much impossible for my limited regex skills :) >>> >>> The way I've worked around this is to use wildcards in the regex strings >>> and that seems to be causing my load average to go from ~0.4 to ~2 or even >>> more and the CPU's regularly peak at 100%. >>> >>> Is this expected behaviour? >>> >>> I recall an issue with earlier versions of Graylog where wildcards in >>> stream rules would cause this but I believe that was much improved in the >>> 1.0 release and I have noticed that difference. I'm running 1.0.2 at >>> present. >>> >>> Is there a similar improvement with extractors in 1.1 or is it being >>> worked on perhaps? >>> >>> I intend to put 1.1 into my test lab early next week but it doesn't see >>> anywhere near as many messages/sec as Production so I won't really see any >>> indications until I get it into Production. >>> >>> I've attached my current extractors. >>> >>> Any feedback on this would be great, and in the meantime I'll start >>> trying to optimise my extractors a bit more to see if I can remove some >>> wildcards. >>> >>> Cheers, Pete >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "graylog2" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
