Otto Fowler, Yes, I am Ok with the trade-offs. In case of Active Directory log records can I parse it using non-regex custom parser ? I think we need one pattern matching library right as it is plain text thing ? One of the dummy AD record of my use case would be like this below.
12/02/2017 05:14:43 PM LogName=Security SourceName=Microsoft Windows security auditing. EventCode=4625 EventType=0 Type=Information ComputerName= dc1.ad.ecorp.com TaskCategory=Logon OpCode=Info RecordNumber=95055509895231650867 Keywords=Audit Success Message=An account failed to log on. Subject: Security ID: NULL SID Account Name: - Account Domain: - Logon ID: 0x0 Logon Type: 3 Account For Which Logon Failed: Security ID: NULL SID Account Name: K1560365938U$ Account Domain: ECORP Failure Information: Failure Reason: Unknown user name or bad password. Status: 0xC000006D Sub Status: 0xC000006A Network Information: Workstation Name: K1560365938U Source Network Address: 192.168.151.95 Source Port: 53176 Detailed Authentification Information: Logon Process: NtLmSsp Authentification Package: NTLM Transited Services: - Package Name (NTLM ONLY): - Key Length: 0 This event is generated when a logon request fails. It is generated on the computer where access was attempted. The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe. The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network). The Process Information fields indicate which account and process on the system requested the logon. The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases. The authentication information fields provide detailed information about this specific logon request. Transited services indicate which intermediate services have participated in this logon request. Package name indicates which sub-protocol was used among the NTLM protocols On Wed, Jul 11, 2018 at 8:44 PM, Otto Fowler <ottobackwa...@gmail.com> wrote: > I am not saying it is faster, just giving some info. > > Also, that part of the documentation is not referring to regex v. grok, > but grok verses a custom non-regex parser, at least as I read it. > > If you have the ability to build, deploy, test and maintain a custom > parser ( unless you will be submitting it to the project? ), then in most > cases where performance > is the top issue ( or rather throughput ) you are most likely going to get > better results that way. Accepting that you are ok with the tradeoffs. > > If you have 10M mps parsing might night be your bottleneck. > > > > > > On July 11, 2018 at 11:01:19, Muhammed Irshad (irshadkt....@gmail.com) > wrote: > > Otto Fowler, > > Thanks for the reply. I saw it uses same Java regex under the hood. I got > bit sceptic by seeing this open issue > <https://github.com/thekrakken/java-grok/issues/75> in java-grok which > says > grok is much slower when compared with pure regex. The fix is not > available > yet in metron as it need few changes in the API and issue to be closed. As > data volume is so huge in my requirement I had to double check and confirm > before I go with one. Also metron documentation > <https://metron.apache.org/current-book/metron-platform/ > metron-parsers/index.html> > itself says the below statement under Parser Adapter section. > > "Grok parser adapters are designed primarly for someone who is not a Java > coder for quickly standing up a parser adapter for lower velocity > topologies. Grok relies on Regex for message parsing, which is much slower > than purpose-built Java parsers, but is more extensible. Grok parsers are > defined via a config file and the topplogy does not need to be recombiled > in order to make changes to them." > > On Wed, Jul 11, 2018 at 8:01 PM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > > > Java-Grok IS java regex. It is just a DSL over Java regex. It takes grok > > expressions ( that can reference other expressions and be compound ) and > > parses/resolves them and then builds one big regex out of them. > > Also, Groks, once parsed / used are re-used, so at that point they are > > like compiled regex’s. > > > > That is not to say that that takes 0 time, but it may help you to > > understand. > > > > https://github.com/thekrakken/java-grok/blob/master/src/ > > main/java/io/krakens/grok/api/Grok.java > > https://github.com/thekrakken/java-grok/blob/master/src/ > > main/java/io/krakens/grok/api/GrokCompiler.java > > > > On July 11, 2018 at 07:13:38, Muhammed Irshad (irshadkt....@gmail.com) > > wrote: > > > > Thanks a lot Kevin for replying. Which thread are you mentioning ? The > > stackoverflow link ? I could not see any such option. > > > > On Wed, Jul 11, 2018 at 3:04 PM, Kevin Waterson < > kevin.water...@gmail.com> > > > > wrote: > > > > > Like the thread says, the two regex engines are wildly different, > > however.. > > > you can increase the threads using -w option in grok to increase the > > > threads. > > > > > > Kevin > > > > > > On Wed, Jul 11, 2018 at 5:35 PM Muhammed Irshad < > irshadkt....@gmail.com> > > > > > wrote: > > > > > > > Hi All, > > > > > > > > I am trying to write Java custom parser for parsing AD logs. I am > > > expecting > > > > log flow of 10 million AD events per second. Is using Java regex to > > parse > > > > benefit over using Grok parser in terms of performance ? Is there > any > > > > performance benchmark or insights regarding the same ? > > > > > > > > I found this stackoverflow > > > > < > > > > https://stackoverflow.com/questions/43222863/logstash- > > > grok-filter-is-slower-than-java-regex-pattern-matching > > > > > > > > > question which inspired me for this post. > > > > > > > > -- > > > > Muhammed Irshad K T > > > > Senior Software Engineer > > > > +919447946359 > > > > irshadkt....@gmail.com > > > > Skype : muhammed.irshad.k.t > > > > > > > > > > > > > > > -- > > Muhammed Irshad K T > > Senior Software Engineer > > +919447946359 > > irshadkt....@gmail.com > > Skype : muhammed.irshad.k.t > > > > > > > -- > Muhammed Irshad K T > Senior Software Engineer > +919447946359 > irshadkt....@gmail.com > Skype : muhammed.irshad.k.t > > -- Muhammed Irshad K T Senior Software Engineer +919447946359 irshadkt....@gmail.com Skype : muhammed.irshad.k.t