Thanks Simon. Let me try these out and benchmark performance. On Wed, Jul 11, 2018 at 9:07 PM, Simon Elliston Ball < si...@simonellistonball.com> wrote:
> A streaming token parser might well get you good performance for that > format... maybe something like an antlr grammar or even a simple scanner. > Regex is not the only pattern :) > > It would also be great to see such a parser contributed back to the > community of possible, and I sure we would be happy to help maintain and > improve it in the open source. > > Simon > > > On 11 Jul 2018, at 16:26, Muhammed Irshad <irshadkt....@gmail.com> > wrote: > > > > Otto Fowler, > > > > Yes, I am Ok with the trade-offs. In case of Active Directory log records > > can I parse it using non-regex custom parser ? I think we need one > pattern > > matching library right as it is plain text thing ? One of the dummy AD > > record of my use case would be like this below. > > > > > > 12/02/2017 05:14:43 PM LogName=Security SourceName=Microsoft Windows > > security auditing. EventCode=4625 EventType=0 Type=Information > ComputerName= > > dc1.ad.ecorp.com TaskCategory=Logon OpCode=Info > > RecordNumber=95055509895231650867 Keywords=Audit Success Message=An > account > > failed to log on. Subject: Security ID: NULL SID Account Name: - Account > > Domain: - Logon ID: 0x0 Logon Type: 3 Account For Which Logon Failed: > > Security ID: NULL SID Account Name: K1560365938U$ Account Domain: ECORP > > Failure Information: Failure Reason: Unknown user name or bad password. > > Status: 0xC000006D Sub Status: 0xC000006A Network Information: > Workstation > > Name: K1560365938U Source Network Address: 192.168.151.95 Source Port: > > 53176 Detailed Authentification Information: Logon Process: NtLmSsp > > Authentification Package: NTLM Transited Services: - Package Name (NTLM > > ONLY): - Key Length: 0 This event is generated when a logon request > fails. > > It is generated on the computer where access was attempted. The Subject > > fields indicate the account on the local system which requested the > logon. > > This is most commonly a service such as the Server service, or a local > > process such as Winlogon.exe or Services.exe. The Logon Type field > > indicates the kind of logon that was requested. The most common types > are 2 > > (interactive) and 3 (network). The Process Information fields indicate > > which account and process on the system requested the logon. The Network > > Information fields indicate where a remote logon request originated. > > Workstation name is not always available and may be left blank in some > > cases. The authentication information fields provide detailed information > > about this specific logon request. Transited services indicate which > > intermediate services have participated in this logon request. Package > name > > indicates which sub-protocol was used among the NTLM protocols > > > > On Wed, Jul 11, 2018 at 8:44 PM, Otto Fowler <ottobackwa...@gmail.com> > > wrote: > > > >> I am not saying it is faster, just giving some info. > >> > >> Also, that part of the documentation is not referring to regex v. grok, > >> but grok verses a custom non-regex parser, at least as I read it. > >> > >> If you have the ability to build, deploy, test and maintain a custom > >> parser ( unless you will be submitting it to the project? ), then in > most > >> cases where performance > >> is the top issue ( or rather throughput ) you are most likely going to > get > >> better results that way. Accepting that you are ok with the tradeoffs. > >> > >> If you have 10M mps parsing might night be your bottleneck. > >> > >> > >> > >> > >> > >> On July 11, 2018 at 11:01:19, Muhammed Irshad (irshadkt....@gmail.com) > >> wrote: > >> > >> Otto Fowler, > >> > >> Thanks for the reply. I saw it uses same Java regex under the hood. I > got > >> bit sceptic by seeing this open issue > >> <https://github.com/thekrakken/java-grok/issues/75> in java-grok which > >> says > >> grok is much slower when compared with pure regex. The fix is not > >> available > >> yet in metron as it need few changes in the API and issue to be closed. > As > >> data volume is so huge in my requirement I had to double check and > confirm > >> before I go with one. Also metron documentation > >> <https://metron.apache.org/current-book/metron-platform/ > >> metron-parsers/index.html> > >> itself says the below statement under Parser Adapter section. > >> > >> "Grok parser adapters are designed primarly for someone who is not a > Java > >> coder for quickly standing up a parser adapter for lower velocity > >> topologies. Grok relies on Regex for message parsing, which is much > slower > >> than purpose-built Java parsers, but is more extensible. Grok parsers > are > >> defined via a config file and the topplogy does not need to be > recombiled > >> in order to make changes to them." > >> > >> On Wed, Jul 11, 2018 at 8:01 PM, Otto Fowler <ottobackwa...@gmail.com> > >> wrote: > >> > >>> Java-Grok IS java regex. It is just a DSL over Java regex. It takes > grok > >>> expressions ( that can reference other expressions and be compound ) > and > >>> parses/resolves them and then builds one big regex out of them. > >>> Also, Groks, once parsed / used are re-used, so at that point they are > >>> like compiled regex’s. > >>> > >>> That is not to say that that takes 0 time, but it may help you to > >>> understand. > >>> > >>> https://github.com/thekrakken/java-grok/blob/master/src/ > >>> main/java/io/krakens/grok/api/Grok.java > >>> https://github.com/thekrakken/java-grok/blob/master/src/ > >>> main/java/io/krakens/grok/api/GrokCompiler.java > >>> > >>> On July 11, 2018 at 07:13:38, Muhammed Irshad (irshadkt....@gmail.com) > >>> wrote: > >>> > >>> Thanks a lot Kevin for replying. Which thread are you mentioning ? The > >>> stackoverflow link ? I could not see any such option. > >>> > >>> On Wed, Jul 11, 2018 at 3:04 PM, Kevin Waterson < > >> kevin.water...@gmail.com> > >>> > >>> wrote: > >>> > >>>> Like the thread says, the two regex engines are wildly different, > >>> however.. > >>>> you can increase the threads using -w option in grok to increase the > >>>> threads. > >>>> > >>>> Kevin > >>>> > >>>> On Wed, Jul 11, 2018 at 5:35 PM Muhammed Irshad < > >> irshadkt....@gmail.com> > >>> > >>>> wrote: > >>>> > >>>>> Hi All, > >>>>> > >>>>> I am trying to write Java custom parser for parsing AD logs. I am > >>>> expecting > >>>>> log flow of 10 million AD events per second. Is using Java regex to > >>> parse > >>>>> benefit over using Grok parser in terms of performance ? Is there > >> any > >>>>> performance benchmark or insights regarding the same ? > >>>>> > >>>>> I found this stackoverflow > >>>>> < > >>>>> https://stackoverflow.com/questions/43222863/logstash- > >>>> grok-filter-is-slower-than-java-regex-pattern-matching > >>>>>> > >>>>> question which inspired me for this post. > >>>>> > >>>>> -- > >>>>> Muhammed Irshad K T > >>>>> Senior Software Engineer > >>>>> +919447946359 > >>>>> irshadkt....@gmail.com > >>>>> Skype : muhammed.irshad.k.t > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Muhammed Irshad K T > >>> Senior Software Engineer > >>> +919447946359 > >>> irshadkt....@gmail.com > >>> Skype : muhammed.irshad.k.t > >>> > >>> > >> > >> > >> -- > >> Muhammed Irshad K T > >> Senior Software Engineer > >> +919447946359 > >> irshadkt....@gmail.com > >> Skype : muhammed.irshad.k.t > >> > >> > > > > > > -- > > Muhammed Irshad K T > > Senior Software Engineer > > +919447946359 > > irshadkt....@gmail.com > > Skype : muhammed.irshad.k.t > -- Muhammed Irshad K T Senior Software Engineer +919447946359 irshadkt....@gmail.com Skype : muhammed.irshad.k.t