[ https://issues.apache.org/jira/browse/NIFI-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375023#comment-16375023 ]
J Andrew Skene commented on NIFI-4735: -------------------------------------- I hit this issue a few months and fixed it in a local fork. The NiFi code also skips the last chunk of any EVTX file. The above MR fixes both issues. > ParseEVTX only outputs one event per chunk > ------------------------------------------ > > Key: NIFI-4735 > URL: https://issues.apache.org/jira/browse/NIFI-4735 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Affects Versions: 1.4.0 > Reporter: Terry Brugger > Priority: Major > Attachments: EVTX2XML.xml, Screen Shot 2018-01-03 at 15.06.24.png > > > I have constructed a simple pipeline that reads a Windows EVTX binary file, > runs it through ParseEvtx, and writes out the result (template attached). As > a sample I fed it a 192MiB file and it only output 3.3MiB (see screenshot). > The output file contains 3071 events. Not coincidentally, I am sure, > 192MiB/64KiB = 3072, which would indicate that it only wrote out one event > from each chunk. If I configure the processor to output by the chunk or event > I get 3071 separate files with one event each. Unfortunately, I have no way > to sanitize binary EVTX so I cannot provide the actual file used. > By way of comparison, I ran the same EVTX file through evtx_dump.py from the > python-evtx package (which I understand ParseEvtx was based on) and it > produced 395,757 events -- on par with what I would expect. It also took much > longer than NiFi -- like 30 minutes versus a few seconds -- which I also > expect is consistent with processing the entire file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)