Thanks Simon. Let me try these out and benchmark performance.

On Wed, Jul 11, 2018 at 9:07 PM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> A streaming token parser might well get you good performance for that
> format... maybe something like an antlr grammar or even a simple scanner.
> Regex is not the only pattern :)
>
> It would also be great to see such a parser contributed back to the
> community of possible, and I sure we would be happy to help maintain and
> improve it in the open source.
>
> Simon
>
> > On 11 Jul 2018, at 16:26, Muhammed Irshad <irshadkt....@gmail.com>
> wrote:
> >
> > Otto Fowler,
> >
> > Yes, I am Ok with the trade-offs. In case of Active Directory log records
> > can I parse it using non-regex custom parser ? I think we need one
> pattern
> > matching library right as it is plain text thing ? One of the dummy AD
> > record of my use case would be like this below.
> >
> >
> > 12/02/2017 05:14:43 PM LogName=Security SourceName=Microsoft Windows
> > security auditing. EventCode=4625 EventType=0 Type=Information
> ComputerName=
> > dc1.ad.ecorp.com TaskCategory=Logon OpCode=Info
> > RecordNumber=95055509895231650867 Keywords=Audit Success Message=An
> account
> > failed to log on. Subject: Security ID: NULL SID Account Name: - Account
> > Domain: - Logon ID: 0x0 Logon Type: 3 Account For Which Logon Failed:
> > Security ID: NULL SID Account Name: K1560365938U$ Account Domain: ECORP
> > Failure Information: Failure Reason: Unknown user name or bad password.
> > Status: 0xC000006D Sub Status: 0xC000006A Network Information:
> Workstation
> > Name: K1560365938U Source Network Address: 192.168.151.95 Source Port:
> > 53176 Detailed Authentification Information: Logon Process: NtLmSsp
> > Authentification Package: NTLM Transited Services: - Package Name (NTLM
> > ONLY): - Key Length: 0 This event is generated when a logon request
> fails.
> > It is generated on the computer where access was attempted. The Subject
> > fields indicate the account on the local system which requested the
> logon.
> > This is most commonly a service such as the Server service, or a local
> > process such as Winlogon.exe or Services.exe. The Logon Type field
> > indicates the kind of logon that was requested. The most common types
> are 2
> > (interactive) and 3 (network). The Process Information fields indicate
> > which account and process on the system requested the logon. The Network
> > Information fields indicate where a remote logon request originated.
> > Workstation name is not always available and may be left blank in some
> > cases. The authentication information fields provide detailed information
> > about this specific logon request. Transited services indicate which
> > intermediate services have participated in this logon request. Package
> name
> > indicates which sub-protocol was used among the NTLM protocols
> >
> > On Wed, Jul 11, 2018 at 8:44 PM, Otto Fowler <ottobackwa...@gmail.com>
> > wrote:
> >
> >> I am not saying it is faster, just giving some info.
> >>
> >> Also, that part of the documentation is not referring to regex v. grok,
> >> but grok verses a custom non-regex parser, at least as I read it.
> >>
> >> If you have the ability to build, deploy, test and maintain a custom
> >> parser ( unless you will be submitting it to the project? ), then in
> most
> >> cases where performance
> >> is the top issue ( or rather throughput ) you are most likely going to
> get
> >> better results that way.  Accepting that you are ok with the tradeoffs.
> >>
> >> If you have 10M mps parsing might night be your bottleneck.
> >>
> >>
> >>
> >>
> >>
> >> On July 11, 2018 at 11:01:19, Muhammed Irshad (irshadkt....@gmail.com)
> >> wrote:
> >>
> >> Otto Fowler,
> >>
> >> Thanks for the reply. I saw it uses same Java regex under the hood. I
> got
> >> bit sceptic by seeing this open issue
> >> <https://github.com/thekrakken/java-grok/issues/75> in java-grok which
> >> says
> >> grok is much slower when compared with pure regex. The fix is not
> >> available
> >> yet in metron as it need few changes in the API and issue to be closed.
> As
> >> data volume is so huge in my requirement I had to double check and
> confirm
> >> before I go with one. Also metron documentation
> >> <https://metron.apache.org/current-book/metron-platform/
> >> metron-parsers/index.html>
> >> itself says the below statement under Parser Adapter section.
> >>
> >> "Grok parser adapters are designed primarly for someone who is not a
> Java
> >> coder for quickly standing up a parser adapter for lower velocity
> >> topologies. Grok relies on Regex for message parsing, which is much
> slower
> >> than purpose-built Java parsers, but is more extensible. Grok parsers
> are
> >> defined via a config file and the topplogy does not need to be
> recombiled
> >> in order to make changes to them."
> >>
> >> On Wed, Jul 11, 2018 at 8:01 PM, Otto Fowler <ottobackwa...@gmail.com>
> >> wrote:
> >>
> >>> Java-Grok IS java regex. It is just a DSL over Java regex. It takes
> grok
> >>> expressions ( that can reference other expressions and be compound )
> and
> >>> parses/resolves them and then builds one big regex out of them.
> >>> Also, Groks, once parsed / used are re-used, so at that point they are
> >>> like compiled regex’s.
> >>>
> >>> That is not to say that that takes 0 time, but it may help you to
> >>> understand.
> >>>
> >>> https://github.com/thekrakken/java-grok/blob/master/src/
> >>> main/java/io/krakens/grok/api/Grok.java
> >>> https://github.com/thekrakken/java-grok/blob/master/src/
> >>> main/java/io/krakens/grok/api/GrokCompiler.java
> >>>
> >>> On July 11, 2018 at 07:13:38, Muhammed Irshad (irshadkt....@gmail.com)
> >>> wrote:
> >>>
> >>> Thanks a lot Kevin for replying. Which thread are you mentioning ? The
> >>> stackoverflow link ? I could not see any such option.
> >>>
> >>> On Wed, Jul 11, 2018 at 3:04 PM, Kevin Waterson <
> >> kevin.water...@gmail.com>
> >>>
> >>> wrote:
> >>>
> >>>> Like the thread says, the two regex engines are wildly different,
> >>> however..
> >>>> you can increase the threads using -w option in grok to increase the
> >>>> threads.
> >>>>
> >>>> Kevin
> >>>>
> >>>> On Wed, Jul 11, 2018 at 5:35 PM Muhammed Irshad <
> >> irshadkt....@gmail.com>
> >>>
> >>>> wrote:
> >>>>
> >>>>> Hi All,
> >>>>>
> >>>>> I am trying to write Java custom parser for parsing AD logs. I am
> >>>> expecting
> >>>>> log flow of 10 million AD events per second. Is using Java regex to
> >>> parse
> >>>>> benefit over using Grok parser in terms of performance ? Is there
> >> any
> >>>>> performance benchmark or insights regarding the same ?
> >>>>>
> >>>>> I found this stackoverflow
> >>>>> <
> >>>>> https://stackoverflow.com/questions/43222863/logstash-
> >>>> grok-filter-is-slower-than-java-regex-pattern-matching
> >>>>>>
> >>>>> question which inspired me for this post.
> >>>>>
> >>>>> --
> >>>>> Muhammed Irshad K T
> >>>>> Senior Software Engineer
> >>>>> +919447946359
> >>>>> irshadkt....@gmail.com
> >>>>> Skype : muhammed.irshad.k.t
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Muhammed Irshad K T
> >>> Senior Software Engineer
> >>> +919447946359
> >>> irshadkt....@gmail.com
> >>> Skype : muhammed.irshad.k.t
> >>>
> >>>
> >>
> >>
> >> --
> >> Muhammed Irshad K T
> >> Senior Software Engineer
> >> +919447946359
> >> irshadkt....@gmail.com
> >> Skype : muhammed.irshad.k.t
> >>
> >>
> >
> >
> > --
> > Muhammed Irshad K T
> > Senior Software Engineer
> > +919447946359
> > irshadkt....@gmail.com
> > Skype : muhammed.irshad.k.t
>



-- 
Muhammed Irshad K T
Senior Software Engineer
+919447946359
irshadkt....@gmail.com
Skype : muhammed.irshad.k.t

Reply via email to